Integrating spatial transcriptomics with antibody-based proteomics enables the investigation of biological regulation within intact tissue architecture. However, current approaches for spatial multi-omics integration often depend on dimensionality reduction or autoencoders, which disregard spatial context, limit interpretability, and face challenges with scalability. To address these limitations, we developed INLAomics, a multivariate hierarchical Bayesian framework that models protein abundance in tissue sections by leveraging histological features and latent spatial factors inferred from spatial transcriptomics data. INLAomics supports two key applications: (1) identifying spatial gene co-expression programs to build interpretable gene protein networks, and (2) predicting spatial protein expression in tissues lacking proteomics measurements. Applied across diverse datasets, INLAomics reveals previously unrecognized gene protein associations and achieves substantial improvements in protein prediction accuracy over models that treat each modality independently. The framework is both computationally efficient and biologically interpretable, offering a scalable solution for integrative analysis of spatial multi-omics data