Reading and speech recognition rely on multi-level processing that builds from basic visual or sound features to complete word representations, yet details of these processing hierarchies (in particular those for spoken words) are still poorly understood. We re-analyzed the functional magnetic resonance imaging (fMRI) data provided in the Mother Of all Unification Studies (MOUS) open-science dataset by using parametric regressions of word frequency and sublexical unit (bigram or syllable) frequency during reading and speech listening tasks in order to elucidate lexical processing hierarchies in the visual and auditory modalities. We first validated our approach in the written word domain, where the technique identified significant correlations for word frequency in the left mid-fusiform cortex (at the location of the Visual Word Form Area) with a left occipital region tracking bigram frequency, compatible with prior reports. During listening, low-frequency spoken words elicited greater responses in a left mid-superior temporal region consistent with the recently-described Auditory Word Form Area (AWFA), while a more posterior region of the superior temporal gyrus was sensitive to syllable frequency. Activation in the left inferior frontal gyrus correlated with both written and spoken word frequency. These findings demonstrate parallel hierarchical organizations in the anteroventral visual and auditory streams, with modality-specific lexica and upstream sublexical representations that converge in higher-order language areas.