2025 Hyper Recent •CC0 1.0 Universal

This work is dedicated to the public domain. No rights reserved.

Access Preprint From Server
July 18th, 2025
Version: 1
University of Zurich
bioinformatics
biorxiv

The MicrobeAtlas database: Global trends and insights into Earth's microbial ecosystems

Rodrigues, J. R. F. M.Open in Google Scholar•Tackmann, J.Open in Google Scholar•Malfertheiner, L.Open in Google Scholar•Patsch, D.Open in Google Scholar•Perez Molphe Montoya, E.Open in Google Scholar•Napflin, N.Open in Google Scholar•Gaio, D.Open in Google Scholar•Rot, G.Open in Google Scholar•Danaila, M.Open in Google Scholar•Peluso, M. E.Open in Google Scholaret al.

Environmental DNA sequencing has revolutionized our understanding of microbial diversity and ecology. Microbiomes have now been sequenced across the entire planet - from the deep subsurface to the mountain tops - covering a myriad of hosts, biomes, and conditions. Yet, the diversity of sequencing and processing strategies hampers universal insights. MicrobeAtlas unifies more than two million microbiome samples in a single resource, harmonized to facilitate discoveries across technologies. Communities are hierarchically quantified at adjustable SSU rRNA marker gene resolution and feature detailed metadata, including rich geographic information. Connections to genome, phenotype, and ecological resources enable multimodal insights. Microbial lineages can be reliably tracked across environments, including a 'long tail' of rare, uncharacterized species. Recurring community structures and geographic preferences become apparent, and global, taxonomy-specific generalism trends emerge. With MicrobeAtlas (www.microbeatlas.org), both known and newly described species and communities can readily be placed into ecological context, taking full advantage of earlier work.

Similar Papers

biorxiv
Fri Jul 18 2025
Fast parameterization of Martini3 models for fragments and small molecules
Coarse-grained molecular dynamics simulations, such as those performed with the recently parametrized Martini 3 force field, simplify molecular models and enable the study of larger systems over longer timescales. With this new implementation, Martini 3 allows more bead types and sizes, becoming more amenable to study dynamical phenomena involving small molecules such as protein-ligand interaction...
Szczuka, M.
•
pereira, g. P.
•
Walter, L. J.
•
Gueroult, M.
...•
Chavent, M.
biorxiv
Fri Jul 18 2025
Cluefish: mining the dark matter of transcriptional data series with over-representation analysis enhanced by aggregated biological prior knowledge
Interpreting transcriptomic data presents significant challenges, particularly in non-targeted approaches. While modern functional enrichment methods are well-suited for experimental designs involving two conditions, they are less applicable to data series. In this context, we developed Cluefish, a free and open-source, semi-automated R workflow designed for untargeted, comprehensive biological in...
Franklin, E.
•
Billoir, E.
•
Veber, P.
•
Ohanessian, J.
...•
Prud'homme, S. M.
biorxiv
Fri Jul 18 2025
Inferring Progressive Disconnection in Alzheimer's Disease with Probabilistic Boolean Networks
The modern understanding of Alzheimers disease as a disconnection syndrome presents the challenge of quantifying the directed influence between brain regions. To address this, we apply probabilistic Boolean networks to model effective brain connectivity for the first time, introducing a novel framework for analyzing functional magnetic resonance imaging data from a cohort comprising normal control...
Liu, Z.
•
Zhang, L.
biorxiv
Fri Jul 18 2025
RNA-xLSTM: Evaluating xLSTM as an Alternative Foundation to Transformers in RNA Modeling
Transformer-based architectures currently achieve state-of-the-art performance across a wide range of domains, including biological sequence modeling. Motivated by the recent introduction of the xLSTM architecture, we investigate its effectiveness for RNA sequence modeling by comparing a 33.7M-parameter RNA-xLSTM model against two leading RNA language models: RNA-FM and RiNALMo-33M. We pretrain RN...
Pintaric, M.
•
Penic, R. J.
•
Sikic, M.
biorxiv
Fri Jul 18 2025
traceax: a JAX-based framework for stochastic trace estimation
In many applications, from statistical inference to machine learning, calculating the trace of a matrix is a fundamental operation, yet may be infeasible due to memory constraints. Stochastic trace estimation offers a practical solution by using randomized matrix-vector products to obtain accurate, unbiased estimates without constructing the full matrix in memory. Here, we present traceax, a Pytho...
Nahid, A. A.
•
Serafin, L.
•
Mancuso, N.
biorxiv
Fri Jul 18 2025
Improved Mutation Detection in Duplex Sequencing Data with Sample-Specific Error Profiles
Duplex sequencing enables highly accurate detection of rare somatic mutations, but existing variant callers often rely on protocol-specific heuristics that limit sensitivity, reproducibility, and cross-study comparability. We present DupCaller, a probabilistic variant caller that builds sample-specific error profiles and applies a strand-aware statistical model for mutation detection. Across 50 sy...
Cheng, Y.
•
Nandi, S.
•
Culibrk, L.
•
Kristin, A.
...•
Alexandrov, L. B.
biorxiv
Fri Jul 18 2025
A Deep Learning-based Method for Drug Molecule Representation and Property Prediction
Accurately and robustly representing drug molecule features, prediction of drug-target biomacromolecule interactions, and determining drug molecule physicochemical properties are crucial in drug development. However, due to issues such as insufficient generalization ability of single-modal representation, lack of multi-task prediction frameworks, and weak adaptability in cold-start scenarios, thes...
Zhang, Q.
•
Yu, X.
•
Wei, y.
•
Wang, Z.-H.
•
Yu, D.-J.
biorxiv
Fri Jul 18 2025
scMILD: Single-cell Multiple Instance Learning for Sample Classification and Associated Subpopulation Discovery
Linking cellular states to clinical phenotypes is a major challenge in single-cell analysis. Here, we present scMILD, a weakly supervised Multiple Instance Learning framework that robustly identifies condition-associated cells using only sample-level labels. After systematically validating scMILD's accuracy through controlled simulations, we applied it to diverse disease datasets, confirming its a...
Jeong, K.
•
Choi, J.
•
Kim, K.
biorxiv
Fri Jul 18 2025
From 2D to 4D: a Containerized Workflow and Browser to Explore Dynamic Chromatin Architecture
Characterizing the physical organization of the genome is essential for understanding long-range gene regulation, chromatin compartmentalization, and epigenetic accessibility. Hi-C experiments generate two-dimensional (2D) genome-wide contact maps of chromatin interactions by capturing the spatial proximity between genomic loci, which reveal interaction frequencies but lack the spatial resolution ...
Rogers, D. H.
•
Roth, C. J. N.
•
Tauxe, C.
•
Lee, J.
...•
Starkenburg, S.
biorxiv
Fri Jul 18 2025
Enhancing STED Microscopy via Fluorescence Lifetime Unmixing and Filtering in Two-Species SPLIT-STED
Simultaneous super-resolution imaging of multiple fluorophores remains a major challenge in STimulated Emission Depletion (STED) microscopy due to spectral overlap of STED-compatible fluorophores. The combination of STED microscopy and Fluorescence Lifetime Imaging Microscopy (FLIM) offers a powerful alternative for super resolved, multiplexed imaging of biological samples but is hindered by lifet...
Deschenes, A.
•
Ollier, A.
•
Lafontaine, M.
•
Michaud-Gagnon, A.
...•
Lavoie-Cardinal, F.