Infants preferentially process familiar social signals, but the neural mechanisms underlying continuous processing of maternal speech remain unclear. Using EEG-based neural encoding models based on temporal response functions, we investigated how 7-month-old infants track maternal vs. unfamiliar speech and whether this affects simultaneous face processing. Infants showed stronger neural tracking of their mother\'s voice, independent of acoustic properties, suggesting an early neural signature of voice familiarity. Face tracking responses differed depending on the voice infants heard. When listening to a stranger\'s voice, face-tracking accuracy at central electrodes increased with occipital face tracking, suggesting heightened attentional engagement. However, we found no evidence for differential processing of happy vs. fearful faces, contrasting previous findings on early emotion discrimination. Our results reveal interactive effects of voice familiarity on multimodal processing in infancy: while maternal speech enhances neural tracking, it may also alter how other social cues, such as faces, are processed. The findings suggest that early auditory experiences shape how infants allocate cognitive resources to social stimuli, emphasizing the need to consider cross-modal influences in early development.