2025 Hyper Recent •CC0 1.0 Universal

This work is dedicated to the public domain. No rights reserved.

Access Preprint From Server
July 18th, 2025
Version: 1
Nanjing University of Science and Technology
bioinformatics
biorxiv

A Deep Learning-based Method for Drug Molecule Representation and Property Prediction

Zhang, Q.Open in Google Scholar•Yu, X.Open in Google Scholar•Wei, y.Open in Google Scholar•Wang, Z.-H.Open in Google Scholar•Yu, D.-J.Open in Google Scholar

Accurately and robustly representing drug molecule features, prediction of drug-target biomacromolecule interactions, and determining drug molecule physicochemical properties are crucial in drug development. However, due to issues such as insufficient generalization ability of single-modal representation, lack of multi-task prediction frameworks, and weak adaptability in cold-start scenarios, these tasks remain challenging. Here, we introduce DrugDL, a framework designed for drug molecule representation and the prediction of multiple downstream tasks, including drug-target interactions, binding affinities, binding sites, physicochemical properties, toxicity, and drug-drug interactions. DrugDL achieves joint representation learning of the drug chemical space and the target protein biological space and analyzes the multi-scale interaction mechanisms between drug molecules and target proteins by introducing cross-modal contrastive learning and single-modal feature enhancement algorithms. It employs a multi-task prediction framework to predict multiple properties of drug molecules. In practical applications, DrugDL outperforms state-of-the-art methods, especially in cold-start tasks. It\'s successfully applied to high-throughput screening, identifying inhibitors for SARS-CoV-2 and metabolic enzymes, and aids in predicting cancer-targeted drugs. Validations for EGFR and ALK targets confirm its efficiency as a precise drug discovery tool. Leveraging accurate molecular representation and multi-property prediction, DrugDL provides full-chain technical support for drug development, significantly accelerating the drug discovery process.

Similar Papers

biorxiv
Fri Jul 18 2025
scMILD: Single-cell Multiple Instance Learning for Sample Classification and Associated Subpopulation Discovery
Linking cellular states to clinical phenotypes is a major challenge in single-cell analysis. Here, we present scMILD, a weakly supervised Multiple Instance Learning framework that robustly identifies condition-associated cells using only sample-level labels. After systematically validating scMILD's accuracy through controlled simulations, we applied it to diverse disease datasets, confirming its a...
Jeong, K.
•
Choi, J.
•
Kim, K.
biorxiv
Fri Jul 18 2025
Inferring Progressive Disconnection in Alzheimer's Disease with Probabilistic Boolean Networks
The modern understanding of Alzheimers disease as a disconnection syndrome presents the challenge of quantifying the directed influence between brain regions. To address this, we apply probabilistic Boolean networks to model effective brain connectivity for the first time, introducing a novel framework for analyzing functional magnetic resonance imaging data from a cohort comprising normal control...
Liu, Z.
•
Zhang, L.
biorxiv
Thu Jul 17 2025
Mapping the Metalloproteome of Deinococcus indicus DR1 through Integrative Structure and Function Annotation
Deinococcus indicus DR1 is a rod-shaped bacterium isolated from the Dadri wetlands (Uttar Pradesh, India) that tolerates ionizing radiation and arsenic. The molecular basis of its wider heavy-metal resilience, particularly among the 1017 out of 4128 proteins still annotated as hypothetical, remains unclear. We performed a proteome-wide structural and functional survey to address this gap. All the ...
Ramesh, S. D.
•
Vasan, G.
•
Senthilkumar, S.
•
Thambiraja, M.
...•
Yennamalli, R. M.
biorxiv
Thu Jul 17 2025
SVPG: A pangenome-based structural variant detection approach and rapid augmentation of pangenome graphs with new samples
Breakthrough advances in long-read sequencing technologies have opened unprecedented opportunities to study genetic variations through comprehensive pangenome analysis. However, the availability of structural variant (SV) calling tools that can effectively leverage pangenome information is limited. In addition, efficient construction of pangenome graphs becomes increasingly challenging with acquis...
Hu, H.
•
Gao, R.
•
Jiang, Z.
•
Cao, S.
...•
Wang, G.
biorxiv
Thu Jul 17 2025
MiroSCOPE: An AI-driven digital pathology platform for annotating functional tissue units
Cancer tissue analysis in digital pathology is typically conducted across different spatial scales, ranging from high-resolution cell-level modeling to lower-resolution tile-based assessments. However, these perspectives often overlook the structural organization of functional tissue units (FTUs), the small, repeating structures which are crucial to tissue function and key factors during pathologi...
Fenner, M. R.
•
Sevim, S.
•
Wu, G.
•
Beavers, D.
...•
Demir, E.
biorxiv
Thu Jul 17 2025
A periodic table of bacteria?: Mapping bacterial diversity in trait space
Bacterial diversity can be overwhelming. There is an ever-expanding number of bacterial taxa being discovered, but many of these taxa remain uncharacterized with unknown traits and environmental preferences. This diversity makes it challenging to interpret ecological patterns in microbiomes and understand why individual taxa, or assemblages, may vary across space and time. While we can use informa...
Hoffert, M. C.
•
Lladser, M. E.
•
Gorman, E. D.
•
Fierer, N.
biorxiv
Thu Jul 17 2025
PromoterAtlas: decoding regulatory sequences across Gammaproteobacteria using a transformer model
Recent advances in deep learning, particularly transformer architectures, have improved computational approaches for biological sequence analysis. Despite these advances, computational models for bacterial promoter prediction have remained limited by small datasets, species-specific training, and binary classification approaches rather than comprehensive annotation frameworks. We present PromoterA...
Coppens, L.
•
Ledesma-Amaro, R.
biorxiv
Thu Jul 17 2025
mm2-ivh: simple and precise overlap detection in alpha satellite HORs with interval hashing
Summary: We propose a new algorithm, \"interval hashing,\" which distinguishes identical k-mers arising from different repeat sequences, particularly in complex repeat arrays such as alpha satellite HORs. We implement this algorithm as a fork of minimap2, named mm2-ivh. In local assembly of alpha satellite HORs, mm2-ivh accurately reconstructs more haplotypes than assemblers using standard minimiz...
Suzuki, H.
•
Sugawa, M.
•
Sakamoto, Y.
•
Shiraishi, Y.
biorxiv
Thu Jul 17 2025
scDNAm-GPT: A Foundation Model for Capturing Long-Range CpG Dependencies in Single-Cell Whole-Genome Bisulfite Sequencing to Enhance Epigenetic Analysis
Accurately identifying development- and disease-associated DNA methylation features from single-cell DNA methylation data remains challenging due to the genome-wide scale and the sparse, stochastic nature of CpG coverage. We present scDNAm-GPT, a novel framework that integrates CpG token design, a Mamba backbone, and a cross-attention head to efficiently process ultra-long sequences while preservi...
Liang, C.
•
Ye, P.
•
Yan, H.
•
Zheng, P.
...•
Li, J.