Recent advances in the cognitive neuroscience of language have embraced naturalistic stimuli such as movies and audiobooks. However, most open-access neuroimaging datasets still focus on single-speaker scenarios, falling short of capturing the complexity of real-life, multi-speaker communication. To address this gap, we present the BABA fMRI and MEG dataset, collected while participants watched a 25-minute excerpt from a Chinese reality TV show featuring 11 speakers, including five fathers and their children. Set in a rural village, the show captures natural parent-child interactions with spontaneous, emotionally rich, and socially dynamic dialogue. Unlike scripted or isolated speech, it includes overlapping speech, speaker switches, and interruptions, offering a more ecologically valid stimulus. The combined use of fMRI and MEG allows researchers to explore both spatial and temporal aspects of language processing. This dataset provides a valuable resource for investigating the neural mechanisms underlying multi-talker comprehension, attentional shifts, and real-world social communication.