Music is distinguished from other natural sounds by the presence of relatively discrete notes, which are then organized across pitch and time to convey melody, harmony, and rhythm. Growing evidence suggests that small clusters of neural populations within anterior and posterior human non-primary auditory cortex respond selectively to musical structure. However, it is unclear whether this selectivity reflects short-term musical structure at the level of individual notes, and/or the patterning of notes in pitch and time. We used fMRI voxel decomposition to measure the response of music-selective and non-selective auditory neural populations to synthetic music and drum stimuli whose notes were scrambled in pitch and/or time, disrupting musical pattern structure while largely preserving note-level structure. We observed reliably stronger responses to music with intact pitch and temporal pattern structure in both anterior and posterior music- selective regions bilaterally, but little difference between intact and scrambled music in non- selective populations. Further, only music-selective populations showed reliably stronger responses to note-scrambled music compared with non-music sounds. These results suggest that musical structure involving both individual notes and their patterning over time is specifically represented in localized music-selective neural populations of human non-primary auditory cortex.