2025 Hyper Recent •CC0 1.0 Universal

This work is dedicated to the public domain. No rights reserved.

Access Preprint From Server
May 9th, 2025
Version: 2
American Museum of Natural History
genomics
biorxiv

The information content of species: formal definitions of pangenome complexity track with bacterial lifestyle

Narechania, A.Open in Google Scholar•Bobo, D.Open in Google Scholar•Gilbert, T.Open in Google Scholar•Gopalakrishnan, S.Open in Google Scholar

Genes and other genomic elements have variable presence absence patterns across most bacterial species. Pangenome fluidity is often invoked to measure this genome flux. Fluid pangenomes contain genes found only in subsets of species strains. Tighter pangenomes contain more genes that define a shared core. Species definitions are often tied to this pangenome diversity. In any global comparative framework, pangenomes must be calculated across all known species. But defining pangenomes is fraught with computational and biological challenges, requiring assembly, annotation, alignment, and phylogenetics of millions of orthologs. We offer an alternate view that de-centers the gene and emphasizes the raw information content of sequences. Information is data that reduces uncertainty. Tight pangenomes, with elements repeated across every strain in a species ensemble, contain more complete information. In contrast, fluid pangenomes have more uncertainty, higher complexity, and higher information diversity. Bacterial lifestyle has been shown to drive this information diversity. For example, challenging environments often increase information diversity by encouraging the accrual of auxiliary genes. Here, we employ agile complexity metrics to quantify this increase. Ensembles of free-living, motile, and non-pathogenic species have high genomic complexity. Ensemble complexity decreases in species bound to specific hosts. Because we eliminate annotation and alignment, our method is fast enough to evaluate species across all known bacterial genomes. The approach democratizes classification and our results highlight how broad the term "species" has become.

Similar Papers

biorxiv
Fri May 09 2025
A comprehensive water buffalo pangenome reveals extensive structural variation linked to population specific signatures of selection
Water buffalo is a cornerstone livestock species in many low- and middle-income countries, yet major gaps persist in its genomic characterization, complicated by the divergent karyotypes of its two sub-species (swamp and river). Such genomic complexity makes water buffalo a particularly good candidate for the use of graph genomics, which can capture variation missed by linear reference approaches....
Arshad, F.
•
Jayaraman, S.
•
Talenti, A.
•
Owen, R.
...•
Prendergast, J. G.
biorxiv
Fri May 09 2025
Natural variation in chalcone isomerase defines a major locus controlling radial stem growth variation among Populus nigra populations
Poplar is a promising resource for wood production and the development of lignocellulosic biomass, but currently available varieties have not been optimized for these purposes. Therefore, it is critical to investigate the genetic variability and mechanisms underlying traits that affect biomass yield. Previous studies have shown that target traits in different poplar species are complex, with a sma...
Durufle, H.
•
Dejardin, A.
•
Jorge, V.
•
Pegard, M.
...•
Segura, V.
biorxiv
Fri May 09 2025
Spatially varying graph estimation for spatial transcriptomics cancer data
Modern spatial transcriptomic profiling techniques facilitate spatially resolved, high-dimensional assessment of cellular gene transcription across the tumor domain. The characterization of spatially varying gene networks enables the discovery of heterogeneous regulatory patterns and biological mechanisms underlying cancer etiology. We propose a \\textit{spatial Graphical Regression} (\\texttt{sGR...
Acharyya, S.
•
Kang, J.
•
Baladandayuthapani, V.
biorxiv
Fri May 09 2025
Genome Dynamics and Chromosome Structural Variations in Histoplasma ohiense, a fungal pathogen of humans
Histoplasma is a clinically important but understudied genus of thermally dimorphic human fungal pathogens. Histoplasma species normally transition between a multicellular sporulating hyphal form in the soil and a unicellular pathogenic yeast form in a mammalian host. Little is known about genome plasticity of Histoplasma, which we address in this study with the ultimate goal of increasing our und...
Heater, S.
•
Voorhies, M.
•
Sil, A.
biorxiv
Thu May 08 2025
All of Us diversity and scale improve polygenic prediction contextually with greatest improvements for underrepresented populations
Recent studies have demonstrated that polygenic risk scores (PRS) trained on multi-ancestry data can improve prediction accuracy in groups historically underrepresented in genomic studies, but the availability of linked health and genetic data from large-scale diverse cohorts representative of a wide spectrum of human diversity remains limited. To address this need, the All of Us research program ...
Tsuo, K.
•
Shi, Z.
•
Ge, T.
•
Mandla, R.
...•
Martin, A. R.
biorxiv
Thu May 08 2025
EnrichSci: Transcript-guided Targeted Cell Enrichment for Scalable Single-Cell RNA Sequencing
Large-scale single-cell atlas efforts have revealed many aging- or disease-associated cell types, yet these populations are often underrepresented in heterogeneous tissues, limiting detailed molecular and dynamic analyses. To address this, we developed EnrichSci - a highly scalable, microfluidics-free platform that combines Hybridization Chain Reaction RNA FISH with combinatorial indexing to profi...
Liao, A.
•
Zhang, Z.
•
Sziraki, A.
•
Abdulraouf, A.
...•
Cao, J.
biorxiv
Thu May 08 2025
Integrative analysis of RNA binding proteins identifies DDX55 as a novel regulator of 3'UTR isoform diversity
The 3\' untranslated regions (3\'UTRs) of mRNAs play a critical role in controlling gene expression and function because they contain binding sites for microRNAs and RNA binding proteins (RBPs) that alter mRNA stability, localization, and translation. Most mRNA 3\' ends contain multiple polyadenylation sites (PAS) that can be utilized in condition-specific manners, a process known as alternative p...
Gazzara, M. R.
•
Cater, T.
•
Mallory, M. J.
•
Barash, Y.
•
Lynch, K. W.
biorxiv
Thu May 08 2025
Cryptic diversity arises from glacial cycles in Pacific herring, a critical forage fish
Forage fishes are biological drivers throughout the Pacific Ocean, from the Arctic to nearly subtropical latitudes. As a critical trophic link, the health and stability of Pacific herring (Clupea pallasii) populations have implications for other marine species, including several targeted by large, productive fisheries. Previous research has indicated marked divergence between Pacific herring in th...
Timm, L. E.
•
Almgren, S. A.
•
Lopez, J. A.
•
Glass, J. R.
biorxiv
Thu May 08 2025
Identification of a novel transcriptome signature for predicting the response to anti-TNF-α treatment in rheumatoid arthritis patients
Objectives: To identify and validate a transcriptomic signature capable of predicting the response to antitumour necrosis factor (TNF) therapy in patients with rheumatoid arthritis (RA) before treatment initiation. Methods: We performed a retrospective transcriptomic analysis using two public datasets, RNA-seq data from peripheral blood mononuclear cells in GSE138746 and microarray data from whole...
Pena, R. D.