2025 Hyper Recent •CC0 1.0 Universal

This work is dedicated to the public domain. No rights reserved.

Access Preprint From Server
July 18th, 2025
Version: 1
School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong
bioinformatics
biorxiv

AtlasAgent: Vision language model and Agent-guided Framework for Evaluation of Atlas-scale Single-cell Integration

Yin, D.Open in Google Scholar•Zhang, Z.Open in Google Scholar•Liu, X.Open in Google Scholar•Ni, K.Open in Google Scholar•Su, H.Open in Google Scholar•Li, N. L. L.Open in Google Scholar•Dong, H.Open in Google Scholar•Zhao, Q.Open in Google Scholar•Lin, X.Open in Google Scholar•Tian, L.Open in Google Scholaret al.

As single-cell omics transitions into the era of AI-virtual cells (AIVC), where large-scale single-cell data integration becomes prevalent, the computational demands of integration evaluation emerge as critical scalability bottlenecks. Traditional integration evaluation pipelines, requiring metrics like k-nearest-neighbor batch effect test (kBET) and Local Inverse Simpson\'s Index (iLISI) employed by state-of-the-art scIB method, often demand large computational resources and long runtimes, making them infeasible for large scale integration studies. Herein, we present AtlasAgent, the first vision-language model (VLM)-powered and AI agent framework to accelerate atlas-scale integration evaluation at unprecedented speed and scale. We systematically evaluate batch correction quality, biological signal preservation and overcorrection risks using chain-of-thought reasoning in conjunction with few-shot and zero-shot prompting strategies. AtlasAgent completes evaluation within 32 seconds, in contrast to scIB runtime of 5.55 hours in GPU, while identifying the scIB-determint best integration methods within the top-3 in 88.3% of the time, lowering evaluation time from hours to seconds while preserving alignment with domain expert reasoning. AtlasAgent pioneers the use of VLMs to realize scalable and rapid integration evaluation at atlas scale.

Similar Papers

biorxiv
Fri Jul 18 2025
Improving ADMET prediction with descriptor augmentation of Mol2Vec embeddings
The accurate prediction of ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) properties is essential for early-stage drug development, helping to reduce late-stage attrition and guide compound prioritization. In recent years, machine learning models have emerged as powerful tools for ADMET prediction, leveraging diverse molecular representations ranging from handcrafted descrip...
Stratiichuk, R.
•
Shevchuk, N.
•
Kyrylenko, R.
•
Vozniak, V.
...•
Nafiev, A.
biorxiv
Fri Jul 18 2025
Molecularly informed analysis of histopathology images using natural language
Histopathology refers to the microscopic examination of diseased tissues and routinely guides treatment decisions for cancer and other diseases. Currently, this analysis focuses on morphological features but rarely considers gene expression information, which can add an important molecular dimension. Here, we introduce SpotWhisperer, an AI method that links histopathological images to spatial gene...
Schaefer, M.
•
Nonchev, K.
•
Awasthi, A.
•
Burton, J.
...•
Bock, C.
biorxiv
Fri Jul 18 2025
VaxKG: Integrating The Vaccine Ontology And VIOLIN For Advanced Vaccine Queries And LLM-Powered Chat Systems
Vaccine research faces challenges in integrating diverse biomedical datasets. While the Vaccine Investigation and Online Information Network (VIOLIN) provides comprehensive vaccine data, implemented in traditional relational models limit complex analysis. Similarly, the Vaccine Ontology (VO) offers standardized semantic frameworks but lacks comprehensive empirical data. This study addresses these ...
Yeh, F.-Y.
•
Asato, M.
•
Zheng, J.
•
He, Y.
biorxiv
Fri Jul 18 2025
rhinotypeR: An R package for Rhinovirus Genotyping
Rhinoviruses (RV) are common pathogens characterized by extremely high antigenic and genotypic diversity, yet the tools for their genotyping remain limited. We introduce rhinotypeR, an R package designed to streamline the genotyping of RVs using the VP4/2 region by automating sequence comparison against prototype strains and applying predefined pairwise distance thresholds. RhinotypeR offers a com...
Luka, M. M.
•
Nanjala, R.
•
Rashed, W. M.
•
Gatua, W.
•
Awe, O. I.
biorxiv
Fri Jul 18 2025
Enhancing STED Microscopy via Fluorescence Lifetime Unmixing and Filtering in Two-Species SPLIT-STED
Simultaneous super-resolution imaging of multiple fluorophores remains a major challenge in STimulated Emission Depletion (STED) microscopy due to spectral overlap of STED-compatible fluorophores. The combination of STED microscopy and Fluorescence Lifetime Imaging Microscopy (FLIM) offers a powerful alternative for super resolved, multiplexed imaging of biological samples but is hindered by lifet...
Deschenes, A.
•
Ollier, A.
•
Lafontaine, M.
•
Michaud-Gagnon, A.
...•
Lavoie-Cardinal, F.
biorxiv
Fri Jul 18 2025
traceax: a JAX-based framework for stochastic trace estimation
In many applications, from statistical inference to machine learning, calculating the trace of a matrix is a fundamental operation, yet may be infeasible due to memory constraints. Stochastic trace estimation offers a practical solution by using randomized matrix-vector products to obtain accurate, unbiased estimates without constructing the full matrix in memory. Here, we present traceax, a Pytho...
Nahid, A. A.
•
Serafin, L.
•
Mancuso, N.
biorxiv
Fri Jul 18 2025
Inferring Progressive Disconnection in Alzheimer's Disease with Probabilistic Boolean Networks
The modern understanding of Alzheimers disease as a disconnection syndrome presents the challenge of quantifying the directed influence between brain regions. To address this, we apply probabilistic Boolean networks to model effective brain connectivity for the first time, introducing a novel framework for analyzing functional magnetic resonance imaging data from a cohort comprising normal control...
Liu, Z.
•
Zhang, L.
biorxiv
Fri Jul 18 2025
Leviathan: A fast, memory-efficient, and scalable taxonomic and pathway profiler for next generation sequencing (pan)genome-resolved metagenomics and metatranscriptomics
Metagenomic and metatranscriptomic functional profiling is crucial for understanding microbial community capabilities, yet current tools often face challenges in computational efficiency, scalability, and integrated genome-resolved references. Here, I introduce Leviathan, an open-source software package designed to address these limitations. Leviathan implements taxonomic profiling via Sylph and a...
Espinoza, J. L.
biorxiv
Fri Jul 18 2025
Fast parameterization of Martini3 models for fragments and small molecules
Coarse-grained molecular dynamics simulations, such as those performed with the recently parametrized Martini 3 force field, simplify molecular models and enable the study of larger systems over longer timescales. With this new implementation, Martini 3 allows more bead types and sizes, becoming more amenable to study dynamical phenomena involving small molecules such as protein-ligand interaction...
Szczuka, M.
•
pereira, g. P.
•
Walter, L. J.
•
Gueroult, M.
...•
Chavent, M.