A fundamental challenge in mass spectrometry-based proteomics is determining which peptide generated a given MS2 spectrum. Peptide sequencing typically relies on matching spectra against a known sequence database, which in some applications is not available. Deep learning- based de novo sequencing can address this limitation by directly predicting peptide sequences from MS2 data. We have seen the application of the transformer architecture to de novo sequencing produce state-of-the-art results on the so-called nine-species benchmark. In this study, we propose an improved transformer encoder inspired by the heuristics used in the manual interpretation of spectra. We modify the attention mechanism with a learned bias based on pairwise mass differences, termed Pairwise Attention (PA). Adding PA improves average peptide precision at 100% coverage by 12.7% (5.9 percentage points) over our base transformer on the original nine-species benchmark. We have also achieved a 7.4% increase over the previously published model Casanovo. Our MS2 encoding strategy is largely orthogonal to other transformer-based models encoding MS2 spectra, enabling straightforward integration into existing deep-learning approaches. Our results show that integrating domain-specific knowledge into transformers boosts de novo sequencing performance.