The ability of humans to discriminate and identify spatial patterns varies across the visual field, and is generally worse in the periphery than in the fovea. This decline in performance is revealed in many kinds of tasks, from detection to recognition. A parsimonious hypothesis is that the representation of any visual feature is blurred (spatially averaged) by an amount that differs for each feature, but that in all cases increases with eccentricity. Here, we examine models for two such features: local luminance and spectral energy. Each model averages the corresponding feature in pooling windows whose diameters scale linearly with eccentricity. We performed perceptual experiments with synthetic stimuli to determine the largest window scaling for which human and model discrimination abilities match (the "critical" scaling). We used much larger stimuli than those of previous studies, subtending 53.6 by 42.2 degrees of visual angle. We found that the critical scaling for the luminance model was approximately one-fourth that of the energy model and, consistent with earlier studies, that the estimated critical scaling value was smaller when discriminating a synthesized stimulus from a natural image than when discriminating two synthesized stimuli. Moreover, we found that initializing the generation of the synthesized images with natural images reduced the critical scaling value when discriminating two synthesized stimuli, but not when discriminating a synthesized from a natural image stimulus. Together, the results show that critical scaling is strongly affected by the image statistic (pooled luminance vs. spectral energy), the comparison type (synthesized vs. synthesized or synthesized vs. natural), and the initialization image for synthesis (white noise vs natural image). We offer a coherent explanation for these results in terms of alignments and misalignments of the models with human perceptual representations.