2025 Hyper Recent •CC0 1.0 Universal

This work is dedicated to the public domain. No rights reserved.

Access Preprint From Server
January 22nd, 2025
Version: 1
Ecole de physique et chimie industrielle de Paris
bioinformatics
biorxiv

Hybrid Generative Model: Bridging Machine Learning and Biophysics to Expand RNA Functional Diversity

Opuu, V.Open in Google Scholar

The RNA world hypothesis suggests that RNA once catalyzed reactions now performed by proteins. Rediscovering these functions requires exploring sequence spaces beyond natural RNAs. While machine learning (ML)-based RNA design shows promise, it struggles to extrapolate beyond training data. In contrast, biophysics-based approaches leveraging RNA secondary (2D) structure operate independently of training data but are not tailored for functional discovery. We present a hybrid generative model that combines a Potts model with the thermodynamic folding model of RNA 2D structure. This approach disentangles folding contributions from functional signals, such as binding, enabling the data-driven component to focus on tertiary interactions and improving contact predictions. This disentanglement introduces structural imprinting, a novel strategy that uses structural variability to guide mutations, which showed great promise in uncovering hidden natural diversity. By bridging ML and biophysics, this model tackles the longstanding challenge of expanding diversity beyond the mere reproduction of the training data.

Similar Papers

biorxiv
Sat Jul 05 2025
Regulation Flow Analysis discovers molecular mechanisms of action from large knowledge databases
Drug development is a long and expensive process, with only a small fraction of potential drugs being finally approved. The challenge of drug development is rooted in our limited understanding of biological systems and the disease processes that drugs are trying to modulate. We propose a novel method, called Regulation Flow Analysis (RFA), which is based on the principles of biological regulation,...
Roca, C. P.
•
Sysoev, O.
•
Eyre, E.
•
Galan, S.
...•
Mangion, J.
biorxiv
Fri Jul 04 2025
PepBERT: Lightweight language models for bioactive peptide representation
Protein language models (pLMs) have been widely adopted for various protein and peptide-related downstream tasks and demonstrated promising performance. However, short peptides are significantly underrepresented in commonly used pLM training datasets. For example, only 2.8% of sequences in the UniProt Reference Cluster (UniRef) contain fewer than 50 residues, which potentially limits the effective...
Du, Z.
•
Caragea, D.
•
Guo, X.
•
Li, Y.
biorxiv
Fri Jul 04 2025
Sequence Analysis of P4-ATPases Reveals the Structural Determinants for the Stable Monomeric P4B-ATPase Phospholipid Transporters.
The P4-ATPase family of phospholipid flippases plays a critical role in the maintenance of membrane asymmetry and consequently, various roles in cellular protein traffic and eukaryotic homeostasis. Currently, several structures of these (usually heterodimeric) phospholipid flippases have been resolved, along with extensive biochemical characterization of the substrate transport properties. However...
Sai, K. V.
•
Rajan, S. A. S.
•
Lee, J.-Y.
biorxiv
Fri Jul 04 2025
Know your RNA-Seq data in depth: a case study using data from early life stress in mouse
Next-generation sequencing (NGS) is a technology that enables rapid and high-throughput sequencing of entire genomes, transcriptomes or specific DNA/RNA populations. RNA-Seq is an NGS-based method that specifically targets the transcriptome and can be applied to bulk tissue or single cells. NGS produces large volumes of partial sequences (reads), which must be aligned, assembled and analyzed to ex...
Lindlof, A.
biorxiv
Fri Jul 04 2025
OmniCorr: An R-package for visualizing putative host-microbiota interactions using multi-omics data
Holo-omics leverages omics datasets to explore the interactions between hosts and their associated microbiomes. Although the generation of omics data from matching host and microbiome samples is steadily increasing, there remains a scarcity of computational tools capable of integrating and visualizing this data to facilitate the interpretation and prediction of host-microbiota interactions. We pre...
Gupta, S.
•
Lai, W.
•
Kobel, C. M.
•
Aho, V. T. E.
...•
Hvidsten, T. R.
biorxiv
Fri Jul 04 2025
Structural and dynamic study of fungal cell wall degrading fungal chitinase and its interaction with chitooligosaccharide
Chitin, comprising of repeating units of N-acetyl-glucosamine, is the second most abundant polymer occurring in wide range of insects, fungi, yeasts and plants. Chitinases hydrolyze chitin into chitooligomers which finds multifarious uses in various sectors and are gaining attention particularly as a biocontrol agent against chitin-containing insects and plant pathogens. Although fungi are a signi...
Jana, U. K.
•
Shukla, P.
•
Kango, N.
biorxiv
Fri Jul 04 2025
Rapid and Reproducible Multimodal Biological Foundation Model Development with AIDO.ModelGenerator
Foundation models (FMs) for DNA, RNA, proteins, cells, and tissues have begun to close long-standing performance gaps in biological prediction tasks, yet each modality is usually studied in isolation. Bridging them requires software that can ingest heterogeneous data, apply large pretrained backbones from various sources, and perform multimodal benchmarking studies at scale. We present AIDO.ModelG...
Ellington, C. N.
•
Li, D.
•
Zou, S.
•
Cole, E.
...•
Xing, E. P.
biorxiv
Fri Jul 04 2025
Combining AI structure prediction and integrative modelling for nanobody-antigen complexes
Nanobodies exhibit antigen binding affinities of the same order as those of antibodies, which, along with their small size and unique structural characteristics, makes them well-suited for therapeutic and diagnostic applications. The lack of coevolutionary signals in nanobody-antigen complexes together with the broad complementary determining region 3 loop (CDR3) conformational space poses a chall...
Sanchez-Marin, M.
•
Giulini, M.
•
Bonvin, A.
biorxiv
Fri Jul 04 2025
Uncertainty-Aware Discrete Diffusion Improves Protein Design
Protein inverse folding involves generating amino acid sequences that adopt a specified 3D structure---a key challenge in structural biology and molecular engineering. While discrete diffusion models have demonstrated strong performance, existing methods often apply uniform denoising across residues, overlooking position-specific uncertainty. We propose an uncertainty-aware discrete denoising diff...
Mahbub, S.
•
Feinauer, C.
•
Ellington, C. N.
•
Song, L.
•
Xing, E. P.
biorxiv
Fri Jul 04 2025
Identification of candidate biomarkers and pathways associated with multiple sclerosis using bioinformatics and next generation sequencing data analysis
Multiple sclerosis (MS) is an autoinflammatory disease that might lead to severe disability. The diagnosis of MS is defined due to the urgency for biomarkers with both reliability and efficiency. Demyelination of axons are deeply involved in the pathogenesis of MS. Our study aims to identify the underlying molecular mechanism and screening for related biomarkers and signaling pathways. We obtained...
Vastrad, B. M.
•
Pattanashetti, S. M.
•
Vastrad, C. M.