2025 Hyper Recent •CC0 1.0 Universal

This work is dedicated to the public domain. No rights reserved.

Access Preprint From Server
June 30th, 2025
Version: 3
HITSZ
bioinformatics
biorxiv

Controllable Protein Design by Prefix-Tuning Protein Language Models

Luo, J.Open in Google Scholar•Liu, X.Open in Google Scholar•Li, J.Open in Google Scholar•Zhang, Y.Open in Google Scholar•Chen, Q.Open in Google Scholar•Chen, J.Open in Google Scholar

The design of novel proteins with tailored functionalities, particularly in drug discovery and vaccine development, presents a transformative approach to addressing pressing biomedical challenges. Inspired by the remarkable success of pre-trained language models in natural language processing (NLP), protein language models (ProtLMs) have emerged as powerful tools in advancing protein science. While NLP leverages flexible text-based control tags to prompt language model generation, the restricted amino acid space (limited to 20 residues) imposes inherent constraints on achieving analogous controllability. In this study, we propose PrefixProt, a framework for controllable protein design that employs prefix tuning to learn virtual tokens as control tags. These virtual tokens are adaptively tailored to diverse protein properties through a data-driven manner and can be combinatorially integrated to enable multi-objective control over protein generation. The effectiveness of PrefixProt was validated through extensive experiments encompassing both protein structure design (e.g. alpha-helix or beta-sheet topologies) and protein function design (e.g. antimicrobial or anticancer peptide activities). Benchmark results demonstrate that prefix virtual tokens efficiently guide the pre-trained ProtLM by optimizing a smaller number of trainable parameters, outperforming other parameter-efficient fine-tuning methods and text-guided ProtLMs, particularly in scenarios with limited data availability. More importantly, the compositional flexibility of virtual tokens facilitates the generation of proteins with multiple target properties, substantially expanding the scope of design possibilities. By harmonizing controllability, efficiency and generalizability, PrefixProt establishes a robust framework for de novo protein design, with promising applications in drug discovery and biomedicine.

Similar Papers

biorxiv
Mon Jun 30 2025
STORIES: learning cell fate landscapes from spatial transcriptomics
In dynamic biological processes such as development, spatial transcriptomics is revolutionizing the study of the mechanisms underlying spatial organization within tissues. Inferring cell fate trajectories from spatial transcriptomics profiled at several time points has thus emerged as a critical goal, requiring novel computational methods. Wasserstein gradient flow learning is a promising framewor...
Huizing, G.-J.
•
Samaran, J.
•
Capocefalo, D.
•
Audit, A.
...•
Cantini, L.
biorxiv
Mon Jun 30 2025
scHDeepInsight: A Hierarchical Deep Learning Framework for Precise Immune Cell Annotation in Single-Cell RNA-seq Data
Immune cell classification from single-cell RNA sequencing (scRNA-seq) presents significant challenges due to complex hierarchical relationships among cell types. We introduce scHDeepInsight, a deep learning framework that extends our previous scDeepInsight model by integrating a biologically-informed classification architecture with an adaptive hierarchical focal loss. The framework leverages our...
JIA, S.
•
Lysenko, A.
•
Boroevich, K. A.
•
Sharma, A.
•
Tsunoda, T.
biorxiv
Mon Jun 30 2025
Identifying Optimal Machine Learning Approaches for Microbiome-Metabolomics Integration with Stable Feature Selection
Microbiome research has been limited by methodological inconsistencies. Taxonomy-based profiling presents challenges such as data sparsity, variable taxonomic resolution, and the reliance on DNA-based profiling, which provides limited functional insight. Multi-omics integration has emerged as a promising approach to link microbiome composition with function. However, the lack of standardized metho...
Palmer, S. N.
•
Mishra, A. A.
•
Gan, S.
•
Liu, D.
...•
Zhan, X.
biorxiv
Mon Jun 30 2025
FEDRANN: effective long-read overlap detection based on dimensionality reduction and approximate nearest neighbors
Overlap detection is a key step in de novo genome assembly pipelines based on the Overlap-Layout-Consensus (OLC) paradigm. However, existing methods for overlap detection either rely on heuristic seed-and-extension strategies or locality-sensitive hashing (LSH), both of which struggle to handle repetitive genomic regions and the computational burden of large-scale datasets. Here, we present FEDRAN...
Zhang, J.-Y.
•
Miao, C.
•
Qiu, T.
•
Xia, X.
...•
Dong, Y.
biorxiv
Mon Jun 30 2025
CpGeneAge: multi-omics aging clocks associated with Nf-κB signaling pathway in aging
Aging clocks have emerged as the primary tools for measuring biological aging and have been developed for a wide range of single-omic measurements. Epigenetic aging clocks showed high accuracy in age prediction, however, their biological interpretation is still a challenging task. Transcriptomics aging clocks provide better interpretability but worse age prediction accuracy. To exploit the benefit...
Varga, B.
•
Kerepesi, C.
biorxiv
Mon Jun 30 2025
Molecular characterization of unique multi-domain harbouring fungal rhodopsin for establishing their novel opto-synthetic biological usages
Organisms employ light as an external stimulus for regulating cellular functions. The light-sensitive photoreceptors detect light at varying wavelengths, activating signaling cascades and triggering a range of physiological responses. Rhodopsin is a transmembrane heptahelical protein that functions as an ion channel, or a pump, and sensory receptor, respectively. It consists of a light-sensing chr...
Kumari, A.
•
Kumar, A.
•
Sharma, K.
•
Pati, S. R.
...•
KATERIYA, S.
biorxiv
Mon Jun 30 2025
Quantitative analysis of genetic interactions in human cells from genome-wide CRISPR-Cas9 screens
Genetic interaction (GI) networks in model organisms have revealed how combinations of genome variants can impact phenotypes and underscored the value of GI maps for functional genomics. To advance efforts toward a reference human GI network, we developed the quantitative Genetic Interaction (qGI) score, a method for precise GI measurement from genome-wide CRISPR-Cas9 screens in isogenic human cel...
Billmann, M.
•
Costanzo, M.
•
Rahman, M.
•
Chan, K.
...•
Myers, C. L.
biorxiv
Mon Jun 30 2025
A Systematic Benchmark of High-Accuracy PacBio Long-Read RNA Sequencing for Transcript-Level Quantification
PacBio long-read RNA sequencing resolves transcripts with greater clarity than short-read technologies, yet its quantitative performance remains under-evaluated at scale. Here, we benchmark the high-throughput PacBio Kinnex platform against Illumina short-read RNA-seq using matched, deeply sequenced datasets across a time course of endothelial cell differentiation. Compared to Illumina, Kinnex ach...
Wissel, D.
•
Mehlferber, M. M.
•
Nguyen, K. M.
•
Pavelko, V.
...•
Sheynkman, G. M.
biorxiv
Mon Jun 30 2025
Genomic Touchstone: Benchmarking Genomic Language Models in the Context of the Central Dogma
The emergence of genomic language models (gLMs) has revolutionized the analysis of genomic sequences, enabling robust capture of biologically meaningful patterns from DNA sequences for an improved understanding of human genome-wide regulatory programs, variant pathogenicity and therapeutic discovery. Given that DNA serves as the foundational blueprint within the central dogma, the ultimate evaluat...
Wang, Y.
•
Cai, Z.
•
Zeng, Q.
•
Gao, Y.
...•
Chen, H.
biorxiv
Mon Jun 30 2025
Cell type-specific functions of nucleic acid-binding proteins revealed by deep learning on co-expression networks
Nucleic acid-binding proteins (NABPs) exhibit cell type-specific regulatory functions, but their target genes and biological roles remain incompletely characterized due to the limitations of current experimental approaches. Here, we present a deep learning framework that integrates gene co-expression correlations to predict NABP regulatory targets and infer their functions across diverse cellular ...
Osato, N.
•
Sato, K.