2025 Hyper Recent •CC0 1.0 Universal

This work is dedicated to the public domain. No rights reserved.

Access Preprint From Server
June 4th, 2025
Version: 1
Univeristy of Oxford
bioinformatics
biorxiv

REnformer, a single-cell ATAC-seq predicting model to investigate open chromatin sites

Riva, S. G.Open in Google Scholar•Sanders, E.Open in Google Scholar•Wilson, T.Open in Google Scholar•Stranieri, N.Open in Google Scholar•Gur, R.Open in Google Scholar•Baxter, M.Open in Google Scholar•Hughes, J. R.Open in Google Scholar

Genome regulatory elements are fundamental to cellular identity and cell type specific gene expression. Understanding how the underlying genetic code is differentially utilised by different cell types is central to understanding human health and disease. To better understand how DNA encodes genome regulatory elements such as promoters, enhancers, and boundary elements, we leverage the Enformer gene expression and epigenetic prediction model. We used transfer learning with high quality single cell ATAC datasets to develop REnformer, a model to predict chromatin accessibility. By introducing a benchmark for comparing performances against Enformer model, REnformer significantly outperformed Enformer in terms of higher prediction outcomes and lower error rates in all extensive analyses shown; introducing these benchmarks allowed us, and possible future works, to fairly compare such models. We further tested REnformer by predicting the effects of a well characterised -thalassemia variant and found that the prediction aligned with the observed change in genome regulatory element, previously validated. We conclude that REnformer is and can be a state-of-the-art tool to predict cell type specific regulatory elements, and interrogate the effect of genome variation in health and disease.

Similar Papers

biorxiv
Thu Jun 05 2025
Integrating Multimodal Data for a Comprehensive Knowledge Graph to Advance Infectious Disease Research
Infectious diseases remain a formidable threat to global public health, with their escalating morbidity and mortality rates compounded by recurrent epidemics and the alarming rise of antimicrobial resistance (AMR). These challenges have intensified the urgent demand for innovative therapeutic strategies that can accelerate drug development cycles and overcome traditional research bottlenecks. To a...
Fan, H.
•
Guo, L.
•
Li, F.
•
Yuan, Z.
...•
Li, S.
biorxiv
Thu Jun 05 2025
Making AI accessible for forensic DNA profile analysis
Deep learning has the potential to be a powerful tool for automating allele calling in forensic DNA analysis. Studies to date have relied on bespoke model architecture and painstaking manual annotations to train models, which makes it challenging for other researchers to work with these techniques. In this study, we explore the possibility of training a well-performing model using data gathered as...
de Wit, A. K. J. G.
•
Wagenaar, C. D.
•
Janssen, N. A.
•
Hoegen, B.
...•
Ypma, R. J.
biorxiv
Thu Jun 05 2025
Structural and temporal dynamics analysis on PANoptosis in sepsis: a bibliometric analysis
PANoptosis, as a new type of programmed cell death, is characterized by pyroptosis, apoptosis and necroptosis, and is a key mechanism causing a variety of inflammatory diseases. Despite the growing number of studies indicating the crucial role of PANoptosis in sepsis, there has been no bibliometric analysis of the research hotspots and trends in this field. Therefore, this study aims to explore th...
Li, Z.
•
Nie, D.
•
Yin, L.
•
Qin, Q.
...•
Wang, Y.
biorxiv
Thu Jun 05 2025
Supervised Deep Learning for Efficient Cryo-EM Image Alignment in Drug Discovery with cryoPARES
Cryo-Electron Microscopy (cryo-EM) is a pivotal tool for determining the 3D structures of biological macromolecules. Current cryo-EM workflows, while effective, are computationally demanding and require manual intervention, creating bottlenecks for use in high-throughput scenarios such as structure-based drug discovery. Often in structure-based drug discovery, one can assume that all instances of ...
Sanchez-Garcia, R.
•
Berndt, A.
•
Apelbaum, A.
•
Reeks, J.
...•
Saur, M.
biorxiv
Thu Jun 05 2025
Learning Genetic Perturbation Effects with Variational Causal Inference
Advances in sequencing technologies have enhanced the understanding of gene regulation in cells. In particular, Perturb-seq has enabled high-resolution profiling of the transcriptomic response to genetic perturbations at the single-cell level. This understanding has implications in functional genomics and potentially for identifying therapeutic targets. Various computational models have been devel...
Liu, E.
•
Zhang, J.
•
Uhler, C.
biorxiv
Thu Jun 05 2025
Machine learning driven acceleration of biopharmaceutical formulation development using Excipient Prediction Software (ExPreSo)
Formulation development of protein biopharmaceuticals has become increasingly challenging due to new modalities and higher desired drug substance concentrations. The constraint in drug substance supply and the need for many analytical methods means that only a small selection of excipients can be thoroughly tested in the lab. There are few in-silico tools developed to refine the candidate excipien...
Vidal-Henriquez, E.
•
Holder, T.
•
Lee, N. F.
•
Pompe, C.
•
Teese, M. G.
biorxiv
Thu Jun 05 2025
Bit-Reproducible Phylogenetic Tree Inference under Varying Core-Counts via Reproducible Parallel Reduction Operators
Motivation: Phylogenetic trees describe the evolutionary history among biological species based on their genomic data. Maximum Likelihood (ML) based phylogenetic inference tools search for the tree and evolutionary model that best explain the observed genomic data. Given the independence of likelihood score calculations between different genomic sites, parallel computation is commonly deployed. Th...
Stelz, C.
•
Huebner, L.
•
Stamatakis, A.
biorxiv
Thu Jun 05 2025
TCRanalyzer: A user-friendly tool for comprehensive analysis of T-cell diversity, dynamics and potential antigen targets
T cells are critical for immune responses, recognizing antigens via their unique T-cell receptors (TCRs). Analyzing the diverse TCR repertoires, especially the hypervariable CDR3 region, is essential for understanding immune function in health and disease. Current TCR analysis tools often require specialized expertise, computational resources, or sacrifice biological information for efficiency. To...
Seifert, N.
•
Reinke, S.
•
Kurz, N. S.
•
Demmer, J. A.
...•
Altenbuchinger, M.
biorxiv
Thu Jun 05 2025
10 Years of Variational Autoencoder: Insights from Cancer Temporal Progression Studies, a Systematic Literature Review
Deep Learning methods such as Deep Representation Learning (DRL) and, specifically, the Variational Autoencoder (VAE), have been widely used to handle the high dimensionality of available datasets. Hence, these methods have been applied to study cancer through omics data. Cancer is one of the leading causes of death worldwide, and its complex and dynamic nature makes it especially difficult to stu...
Prol-Castelo, G.
•
Cirillo, D.
•
Valencia, A.
biorxiv
Thu Jun 05 2025
MBCO PathNet: Integration and visualization of networks connecting functionally related pathways predicted from transcriptomic and proteomic datasets
Our desktop application MBCO PathNet allows for quick and easy integration and visualization of networks of functionally related pathways predicted from numerous gene and protein lists using the Molecular Biology of the Cell Ontology (MBCO) and other ontologies. Within networks of hierarchical parent-child relationships or functional relationships, pathways are visualized as pie charts where each ...
Hansen, J.
•
Iyengar, R.