2025 Hyper Recent •CC0 1.0 Universal

This work is dedicated to the public domain. No rights reserved.

Access Preprint From Server
June 5th, 2025
Version: 2
Astex Pharmaceuticals
bioinformatics
biorxiv

Supervised Deep Learning for Efficient Cryo-EM Image Alignment in Drug Discovery with cryoPARES

Sanchez-Garcia, R.Open in Google Scholar•Berndt, A.Open in Google Scholar•Apelbaum, A.Open in Google Scholar•Reeks, J.Open in Google Scholar•Williams, P. A.Open in Google Scholar•Poelking, C.Open in Google Scholar•Deane, C.Open in Google Scholar•Saur, M.Open in Google Scholar

Cryo-Electron Microscopy (cryo-EM) is a pivotal tool for determining the 3D structures of biological macromolecules. Current cryo-EM workflows, while effective, are computationally demanding and require manual intervention, creating bottlenecks for use in high-throughput scenarios such as structure-based drug discovery. Often in structure-based drug discovery, one can assume that all instances of a protein are equivalent at the resolutions needed for alignment and it therefore should be possible to harness information about particle poses from previous refinements. Current methods, however, do not leverage this form of prior knowledge, instead aligning each dataset from scratch. We present cryoPARES, a deep learning pose estimation method trained on pre-aligned datasets. Our method not only provides accurate angular predictions significantly faster than traditional approaches but also introduces automated particle pruning capabilities that eliminate manual intervention. These features, together with its single-pass operation, can enable real-time reconstructions that provide feedback during data acquisition. We demonstrate cryoPARES's effectiveness through the rapid structural determination of six ligand-bound complexes across three distinct protein targets and release three new fragment-bound cryo-EM datasets.

Similar Papers

biorxiv
Fri Jun 06 2025
SCNT: An R Package for Data Analysis and Visualization of Single-Cell and Spatial Transcriptomics
Background: The emergence of single-cell (SC) and spatial transcriptomics (ST) has revolutionized our understanding of gene expression dynamics in complex tissues. However, it also presents challenges for data analysis and visualization, particularly due to the complexity of ST data and the diversity of analysis platforms. The SCNT (Single-Cell, Single-Nucleus, and Spatial Transcriptomics Analysis...
Qing, J.
•
Wu, J.
•
Li, Y.
•
Wu, J.
biorxiv
Fri Jun 06 2025
OriGene: A Self-Evolving Virtual Disease Biologist Automating Therapeutic Target Discovery
Therapeutic target discovery remains a critical yet intuition-driven bottleneck in drug development, typically relying on disease biologists to laboriously integrate diverse biomedical data into testable hypotheses for experimental validation. Here, we present OriGene, a self-evolving multi-agent system that functions as a virtual disease biologist, systematically identifying original and mechanis...
Zhang, Z.
•
Qiu, Z.
•
Wu, Y.
•
Li, S.
...•
Zheng, S.
biorxiv
Fri Jun 06 2025
Amira: detection of AMR genes directly from long reads using gene-space de Bruijn graphs
Accurate detection of antimicrobial resistance (AMR) genes is essential for the surveillance, epidemiology and genotypic prediction of AMR. This is typically done by generating an assembly from the sequencing reads of a bacterial isolate and running AMR gene detection tools on the assembly. However, despite advances in long-read sequencing that have greatly improved the quality and completeness of...
Anderson, D.
•
Lima, L.
•
Le, T.
•
Judd, L. M.
...•
Iqbal, Z.
biorxiv
Fri Jun 06 2025
Limitations of Current Machine-Learning Models in Predicting Enzymatic Functions for Uncharacterized Proteins
Thirty to seventy percent of proteins in any given genome have no assigned function and have been labeled as the protein unknome. This large knowledge shortfall is one of the final frontiers of biology. Machine-Learning (ML) approaches are enticing, with early successes demonstrating the ability to propagate functional knowledge from experimentally characterized proteins. An open question is the a...
de Crecy-Lagard, V.
•
Dias, R.
•
Sexson, N.
•
Friedberg, I.
...•
Swairjo, M.
biorxiv
Fri Jun 06 2025
An improved model for prediction of de novo designed proteins with diverse geometries
Nature uses structural variations on protein folds to fine-tune the geometries of proteins for diverse functions, yet deep learning-based de novo protein design methods generate highly regular, idealized protein fold geometries that fail to capture natural diversity. Here, using physics-based design methods, we generated and experimentally validated a dataset of 5,996 stable, de novo designed prot...
Orr, B.
•
Crilly, S. E.
•
Akpinaroglu, D.
•
Zhu, E.
...•
Kortemme, T.
biorxiv
Fri Jun 06 2025
Pangenome-aware DeepVariant
Population-scale genomics information provides valuable prior knowledge for various genomic analyses, especially variant calling. A notable example of such application is the human pangenome reference released by the Human Pangenome Reference Consortium, which has been shown to improve read mapping and structural variant genotyping. In this work, we introduce pangenome-aware DeepVariant, a variant...
Asri, M.
•
Chang, P.-C.
•
Mier, J. C.
•
Siren, J.
...•
Shafin, K.
biorxiv
Fri Jun 06 2025
sCIN: A Contrastive Learning Framework for Single-Cell Multi-omics Data Integration
The rapid advancement of single-cell omics technologies such as scRNA-seq and scATAC-seq has transformed our understanding of cellular heterogeneity and regulatory mechanisms. However, integrating these data types remains challenging due to distributional discrepancies and distinct feature spaces. To address this, we present a novel single-cell Contrastive INtegration framework (sCIN), that integr...
Ebrahimi, A.
•
Siahpirani, A. F.
•
Montazeri, H.
biorxiv
Fri Jun 06 2025
Global profiling of the proteome and acetylome in mice with abdominal aortic aneurysms
Objective: Abdominal Aortic Aneurysm (AAA) is a life-threatening vascular disease with a high risk of rupture. Current treatments rely on surgery, as effective drug therapies remain unavailable due to limited understanding of disease mechanisms and a lack of therapeutic targets. This study aims to identify potential targets for pharmacological intervention through global proteomic and acetylomic a...
Yang, J.
•
Zhang, L.
•
Yang, B.
•
Ding, T.
...•
Liu, J.
biorxiv
Thu Jun 05 2025
Machine learning driven acceleration of biopharmaceutical formulation development using Excipient Prediction Software (ExPreSo)
Formulation development of protein biopharmaceuticals has become increasingly challenging due to new modalities and higher desired drug substance concentrations. The constraint in drug substance supply and the need for many analytical methods means that only a small selection of excipients can be thoroughly tested in the lab. There are few in-silico tools developed to refine the candidate excipien...
Vidal-Henriquez, E.
•
Holder, T.
•
Lee, N. F.
•
Pompe, C.
•
Teese, M. G.
biorxiv
Thu Jun 05 2025
Learning Genetic Perturbation Effects with Variational Causal Inference
Advances in sequencing technologies have enhanced the understanding of gene regulation in cells. In particular, Perturb-seq has enabled high-resolution profiling of the transcriptomic response to genetic perturbations at the single-cell level. This understanding has implications in functional genomics and potentially for identifying therapeutic targets. Various computational models have been devel...
Liu, E.
•
Zhang, J.
•
Uhler, C.