2025 Hyper Recent •CC0 1.0 Universal

This work is dedicated to the public domain. No rights reserved.

Access Preprint From Server
June 6th, 2025
Version: 1
Google Inc, Mountain View, CA, USA
bioinformatics
biorxiv

Pangenome-aware DeepVariant

Asri, M.Open in Google Scholar•Chang, P.-C.Open in Google Scholar•Mier, J. C.Open in Google Scholar•Siren, J.Open in Google Scholar•Eskandar, P.Open in Google Scholar•Kolesnikov, A.Open in Google Scholar•Cook, D. E.Open in Google Scholar•Brambrink, L.Open in Google Scholar•Hickey, G.Open in Google Scholar•Novak, A. M.Open in Google Scholaret al.

Population-scale genomics information provides valuable prior knowledge for various genomic analyses, especially variant calling. A notable example of such application is the human pangenome reference released by the Human Pangenome Reference Consortium, which has been shown to improve read mapping and structural variant genotyping. In this work, we introduce pangenome-aware DeepVariant, a variant caller that uses a pangenome reference alongside sample-specific read alignments. It generates pileup images of both reads and pangenome haplotypes near potential variants and uses a Convolutional Neural Network to infer genotypes. This approach allows directly using a pangenome for distinguishing true variant signals from sequencing or alignment noise. We assessed its performance on various short-read sequencing platforms and read mappers. Across all settings, pangenome-aware DeepVariant outperformed the linear-reference-based DeepVariant, reducing errors by up to 25.5%. We also show that Element reads with pangenome-aware DeepVariant can achieve 23.6% more accurate variant calling performance compared to existing methods.

Similar Papers

biorxiv
Fri Jun 06 2025
SCNT: An R Package for Data Analysis and Visualization of Single-Cell and Spatial Transcriptomics
Background: The emergence of single-cell (SC) and spatial transcriptomics (ST) has revolutionized our understanding of gene expression dynamics in complex tissues. However, it also presents challenges for data analysis and visualization, particularly due to the complexity of ST data and the diversity of analysis platforms. The SCNT (Single-Cell, Single-Nucleus, and Spatial Transcriptomics Analysis...
Qing, J.
•
Wu, J.
•
Li, Y.
•
Wu, J.
biorxiv
Fri Jun 06 2025
TESS: A Forward Simulation Framework for Studying the Role of Transposable Elements in Genome Expansion and Contraction
Genome size varies by several magnitudes among eukaryotes. Expansion and contraction are primarily driven by transposable element activity, but the underlying processes remain enigmatic due to a lack of historical records tracing these changes. Here, we present the TE Evolution Simulation Suite, or TESS, for versatile simulation of whole-genome sequences with varying transposon dynamics. We analyz...
Benson, C. W.
•
Chen, T.-H.
•
Lu, T.
•
Angelin-Bonnet, O.
...•
Ou, S.
biorxiv
Fri Jun 06 2025
Amira: detection of AMR genes directly from long reads using gene-space de Bruijn graphs
Accurate detection of antimicrobial resistance (AMR) genes is essential for the surveillance, epidemiology and genotypic prediction of AMR. This is typically done by generating an assembly from the sequencing reads of a bacterial isolate and running AMR gene detection tools on the assembly. However, despite advances in long-read sequencing that have greatly improved the quality and completeness of...
Anderson, D.
•
Lima, L.
•
Le, T.
•
Judd, L. M.
...•
Iqbal, Z.
biorxiv
Fri Jun 06 2025
Limitations of Current Machine-Learning Models in Predicting Enzymatic Functions for Uncharacterized Proteins
Thirty to seventy percent of proteins in any given genome have no assigned function and have been labeled as the protein unknome. This large knowledge shortfall is one of the final frontiers of biology. Machine-Learning (ML) approaches are enticing, with early successes demonstrating the ability to propagate functional knowledge from experimentally characterized proteins. An open question is the a...
de Crecy-Lagard, V.
•
Dias, R.
•
Sexson, N.
•
Friedberg, I.
...•
Swairjo, M.
biorxiv
Fri Jun 06 2025
An improved model for prediction of de novo designed proteins with diverse geometries
Nature uses structural variations on protein folds to fine-tune the geometries of proteins for diverse functions, yet deep learning-based de novo protein design methods generate highly regular, idealized protein fold geometries that fail to capture natural diversity. Here, using physics-based design methods, we generated and experimentally validated a dataset of 5,996 stable, de novo designed prot...
Orr, B.
•
Crilly, S. E.
•
Akpinaroglu, D.
•
Zhu, E.
...•
Kortemme, T.
biorxiv
Fri Jun 06 2025
sCIN: A Contrastive Learning Framework for Single-Cell Multi-omics Data Integration
The rapid advancement of single-cell omics technologies such as scRNA-seq and scATAC-seq has transformed our understanding of cellular heterogeneity and regulatory mechanisms. However, integrating these data types remains challenging due to distributional discrepancies and distinct feature spaces. To address this, we present a novel single-cell Contrastive INtegration framework (sCIN), that integr...
Ebrahimi, A.
•
Siahpirani, A. F.
•
Montazeri, H.
biorxiv
Fri Jun 06 2025
Global profiling of the proteome and acetylome in mice with abdominal aortic aneurysms
Objective: Abdominal Aortic Aneurysm (AAA) is a life-threatening vascular disease with a high risk of rupture. Current treatments rely on surgery, as effective drug therapies remain unavailable due to limited understanding of disease mechanisms and a lack of therapeutic targets. This study aims to identify potential targets for pharmacological intervention through global proteomic and acetylomic a...
Yang, J.
•
Zhang, L.
•
Yang, B.
•
Ding, T.
...•
Liu, J.
biorxiv
Fri Jun 06 2025
OriGene: A Self-Evolving Virtual Disease Biologist Automating Therapeutic Target Discovery
Therapeutic target discovery remains a critical yet intuition-driven bottleneck in drug development, typically relying on disease biologists to laboriously integrate diverse biomedical data into testable hypotheses for experimental validation. Here, we present OriGene, a self-evolving multi-agent system that functions as a virtual disease biologist, systematically identifying original and mechanis...
Zhang, Z.
•
Qiu, Z.
•
Wu, Y.
•
Li, S.
...•
Zheng, S.
biorxiv
Fri Jun 06 2025
Integrative cross-sample alignment and spatially differential gene analysis for spatial transcriptomics
Spatial transcriptomics (ST) technologies offer rich spatial context for gene expression, with varying spatial resolutions and gene coverages. However, aligning and comparing multiple ST slices, whether derived from the same or different platforms, remains challenging due to nonlinear distortions and limited spatial overlap caused by tissue processing. We present CODA, an integrative framework for...
Tan, Y.
•
Wang, Z.
•
Wang, A.
•
Yan, Y.
...•
Shi, J.