2025 Hyper Recent •CC0 1.0 Universal

This work is dedicated to the public domain. No rights reserved.

Access Preprint From Server
July 17th, 2025
Version: 1
University of Colorado Boulder
bioinformatics
biorxiv

A periodic table of bacteria?: Mapping bacterial diversity in trait space

Hoffert, M. C.Open in Google Scholar•Lladser, M. E.Open in Google Scholar•Gorman, E. D.Open in Google Scholar•Fierer, N.Open in Google Scholar

Bacterial diversity can be overwhelming. There is an ever-expanding number of bacterial taxa being discovered, but many of these taxa remain uncharacterized with unknown traits and environmental preferences. This diversity makes it challenging to interpret ecological patterns in microbiomes and understand why individual taxa, or assemblages, may vary across space and time. While we can use information from the rapidly growing databases of bacterial genomes to infer traits, we still need an approach to organize what we know, or think we know, about bacterial taxa to match taxonomic and phylogenetic information to trait inferences. Inspired by the periodic table of the elements, we have constructed a \'periodic table\' of bacterial taxa to organize and visualize monophyletic groups of bacteria based on the distributions of key traits predicted from genomic data. By analyzing 50,745 genomes across 31 bacterial phyla, we used the Haar-like wavelet transformation, a model-free transformation of trait data, to identify clades of bacteria which are nearly uniform with respect to six selected traits - oxygen tolerance, autotrophy, chlorophototrophy, maximum potential growth rate, GC content and genome size. The identified functionally uniform clades of bacteria are presented in a concise \'periodic table\'-like format to facilitate identification and exploration of bacterial lineages in trait space. While our approach could be improved and expanded in the future, we demonstrate its utility for integrating phylogenetic information with genome-derived trait values to improve our understanding of the bacterial diversity found in environmental and host-associated microbiomes.

Similar Papers

biorxiv
Fri Jul 18 2025
A Deep Learning-based Method for Drug Molecule Representation and Property Prediction
Accurately and robustly representing drug molecule features, prediction of drug-target biomacromolecule interactions, and determining drug molecule physicochemical properties are crucial in drug development. However, due to issues such as insufficient generalization ability of single-modal representation, lack of multi-task prediction frameworks, and weak adaptability in cold-start scenarios, thes...
Zhang, Q.
•
Yu, X.
•
Wei, y.
•
Wang, Z.-H.
•
Yu, D.-J.
biorxiv
Thu Jul 17 2025
Mapping the Metalloproteome of Deinococcus indicus DR1 through Integrative Structure and Function Annotation
Deinococcus indicus DR1 is a rod-shaped bacterium isolated from the Dadri wetlands (Uttar Pradesh, India) that tolerates ionizing radiation and arsenic. The molecular basis of its wider heavy-metal resilience, particularly among the 1017 out of 4128 proteins still annotated as hypothetical, remains unclear. We performed a proteome-wide structural and functional survey to address this gap. All the ...
Ramesh, S. D.
•
Vasan, G.
•
Senthilkumar, S.
•
Thambiraja, M.
...•
Yennamalli, R. M.
biorxiv
Thu Jul 17 2025
scDNAm-GPT: A Foundation Model for Capturing Long-Range CpG Dependencies in Single-Cell Whole-Genome Bisulfite Sequencing to Enhance Epigenetic Analysis
Accurately identifying development- and disease-associated DNA methylation features from single-cell DNA methylation data remains challenging due to the genome-wide scale and the sparse, stochastic nature of CpG coverage. We present scDNAm-GPT, a novel framework that integrates CpG token design, a Mamba backbone, and a cross-attention head to efficiently process ultra-long sequences while preservi...
Liang, C.
•
Ye, P.
•
Yan, H.
•
Zheng, P.
...•
Li, J.
biorxiv
Thu Jul 17 2025
SVPG: A pangenome-based structural variant detection approach and rapid augmentation of pangenome graphs with new samples
Breakthrough advances in long-read sequencing technologies have opened unprecedented opportunities to study genetic variations through comprehensive pangenome analysis. However, the availability of structural variant (SV) calling tools that can effectively leverage pangenome information is limited. In addition, efficient construction of pangenome graphs becomes increasingly challenging with acquis...
Hu, H.
•
Gao, R.
•
Jiang, Z.
•
Cao, S.
...•
Wang, G.
biorxiv
Thu Jul 17 2025
MiroSCOPE: An AI-driven digital pathology platform for annotating functional tissue units
Cancer tissue analysis in digital pathology is typically conducted across different spatial scales, ranging from high-resolution cell-level modeling to lower-resolution tile-based assessments. However, these perspectives often overlook the structural organization of functional tissue units (FTUs), the small, repeating structures which are crucial to tissue function and key factors during pathologi...
Fenner, M. R.
•
Sevim, S.
•
Wu, G.
•
Beavers, D.
...•
Demir, E.
biorxiv
Thu Jul 17 2025
Improving causal effect estimation in multi-ancestry multivariable Mendelian randomization with transfer learning
Multivariable Mendelian randomization (MVMR) is widely used to estimate the causal effects of exposures on disease outcomes. However, its applications have been largely limited to individuals of European ancestry, due to the larger sample sizes available in European genome-wide association studies (GWAS). Although methods that jointly analyze multiple ancestries have been proposed to improve power...
Yang, Y.
•
Zhu, X.
biorxiv
Thu Jul 17 2025
PromoterAtlas: decoding regulatory sequences across Gammaproteobacteria using a transformer model
Recent advances in deep learning, particularly transformer architectures, have improved computational approaches for biological sequence analysis. Despite these advances, computational models for bacterial promoter prediction have remained limited by small datasets, species-specific training, and binary classification approaches rather than comprehensive annotation frameworks. We present PromoterA...
Coppens, L.
•
Ledesma-Amaro, R.
biorxiv
Thu Jul 17 2025
mm2-ivh: simple and precise overlap detection in alpha satellite HORs with interval hashing
Summary: We propose a new algorithm, \"interval hashing,\" which distinguishes identical k-mers arising from different repeat sequences, particularly in complex repeat arrays such as alpha satellite HORs. We implement this algorithm as a fork of minimap2, named mm2-ivh. In local assembly of alpha satellite HORs, mm2-ivh accurately reconstructs more haplotypes than assemblers using standard minimiz...
Suzuki, H.
•
Sugawa, M.
•
Sakamoto, Y.
•
Shiraishi, Y.
biorxiv
Thu Jul 17 2025
Identifying associations between maize leaf transcriptome and bacteriome during different diurnal periods
Bacterial communities play important roles in the plant phyllosphere. Both microbial communities and their hosts have circadian rhythms and are subject to diurnal environmental changes. However, the interaction between the host and microbiome is still poorly understood. Here, we exploit paired sequencing data of host transcriptome and microbiome derived maize genotypes in field conditions and unde...
dos Santos, R. A. C.
•
Hidalgo-Martinez, K. J.
•
Munoz Perez, J. M.
•
Laspisa, D. J.
...•
Wallace, J.