2025 Hyper Recent •CC0 1.0 Universal

This work is dedicated to the public domain. No rights reserved.

Access Preprint From Server
July 18th, 2025
Version: 2
Université de Lorraine, CNRS, LIEC, F-57000 Metz, France
bioinformatics
biorxiv

Cluefish: mining the dark matter of transcriptional data series with over-representation analysis enhanced by aggregated biological prior knowledge

Franklin, E.Open in Google Scholar•Billoir, E.Open in Google Scholar•Veber, P.Open in Google Scholar•Ohanessian, J.Open in Google Scholar•Delignette-Muller, M. L.Open in Google Scholar•Prud'homme, S. M.Open in Google Scholar

Interpreting transcriptomic data presents significant challenges, particularly in non-targeted approaches. While modern functional enrichment methods are well-suited for experimental designs involving two conditions, they are less applicable to data series. In this context, we developed Cluefish, a free and open-source, semi-automated R workflow designed for untargeted, comprehensive biological interpretation of transcriptomic data series. Cluefish applies over-representation analysis on pre-clustered protein-protein interaction networks, using clusters as anchors to identify smaller, more specific biological functions. Innovative features, including cluster merging and recovery of isolated genes through shared biological contexts, enable a more complete exploration of the data. We applied Cluefish to an in-house dataset with zebrafish exposed to a dose-gradient of dibutyl phthalate, and to two published toxicology datasets featuring different organisms. Combined with DRomics, a tool for dose-response analysis, Cluefish identified gene clusters deregulated at low doses and linked to biological functions overlooked by the standard approach. Notably, it revealed that retinoid signalling disruption may be the most sensitive pathway affected by dibutyl phthalate during zebrafish development, potentially leading to morphological changes. The Cluefish workflow aims to provide valuable clues for biological hypothesis generation and experimental validation. It is freely available at https://github.com/ellfran-7/cluefish.

Similar Papers

biorxiv
Fri Jul 18 2025
traceax: a JAX-based framework for stochastic trace estimation
In many applications, from statistical inference to machine learning, calculating the trace of a matrix is a fundamental operation, yet may be infeasible due to memory constraints. Stochastic trace estimation offers a practical solution by using randomized matrix-vector products to obtain accurate, unbiased estimates without constructing the full matrix in memory. Here, we present traceax, a Pytho...
Nahid, A. A.
•
Serafin, L.
•
Mancuso, N.
biorxiv
Fri Jul 18 2025
Fast parameterization of Martini3 models for fragments and small molecules
Coarse-grained molecular dynamics simulations, such as those performed with the recently parametrized Martini 3 force field, simplify molecular models and enable the study of larger systems over longer timescales. With this new implementation, Martini 3 allows more bead types and sizes, becoming more amenable to study dynamical phenomena involving small molecules such as protein-ligand interaction...
Szczuka, M.
•
pereira, g. P.
•
Walter, L. J.
•
Gueroult, M.
...•
Chavent, M.
biorxiv
Fri Jul 18 2025
Enhancing STED Microscopy via Fluorescence Lifetime Unmixing and Filtering in Two-Species SPLIT-STED
Simultaneous super-resolution imaging of multiple fluorophores remains a major challenge in STimulated Emission Depletion (STED) microscopy due to spectral overlap of STED-compatible fluorophores. The combination of STED microscopy and Fluorescence Lifetime Imaging Microscopy (FLIM) offers a powerful alternative for super resolved, multiplexed imaging of biological samples but is hindered by lifet...
Deschenes, A.
•
Ollier, A.
•
Lafontaine, M.
•
Michaud-Gagnon, A.
...•
Lavoie-Cardinal, F.
biorxiv
Fri Jul 18 2025
rhinotypeR: An R package for Rhinovirus Genotyping
Rhinoviruses (RV) are common pathogens characterized by extremely high antigenic and genotypic diversity, yet the tools for their genotyping remain limited. We introduce rhinotypeR, an R package designed to streamline the genotyping of RVs using the VP4/2 region by automating sequence comparison against prototype strains and applying predefined pairwise distance thresholds. RhinotypeR offers a com...
Luka, M. M.
•
Nanjala, R.
•
Rashed, W. M.
•
Gatua, W.
•
Awe, O. I.
biorxiv
Fri Jul 18 2025
Molecularly informed analysis of histopathology images using natural language
Histopathology refers to the microscopic examination of diseased tissues and routinely guides treatment decisions for cancer and other diseases. Currently, this analysis focuses on morphological features but rarely considers gene expression information, which can add an important molecular dimension. Here, we introduce SpotWhisperer, an AI method that links histopathological images to spatial gene...
Schaefer, M.
•
Nonchev, K.
•
Awasthi, A.
•
Burton, J.
...•
Bock, C.
biorxiv
Fri Jul 18 2025
Improving ADMET prediction with descriptor augmentation of Mol2Vec embeddings
The accurate prediction of ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) properties is essential for early-stage drug development, helping to reduce late-stage attrition and guide compound prioritization. In recent years, machine learning models have emerged as powerful tools for ADMET prediction, leveraging diverse molecular representations ranging from handcrafted descrip...
Stratiichuk, R.
•
Shevchuk, N.
•
Kyrylenko, R.
•
Vozniak, V.
...•
Nafiev, A.
biorxiv
Fri Jul 18 2025
Inferring Progressive Disconnection in Alzheimer's Disease with Probabilistic Boolean Networks
The modern understanding of Alzheimers disease as a disconnection syndrome presents the challenge of quantifying the directed influence between brain regions. To address this, we apply probabilistic Boolean networks to model effective brain connectivity for the first time, introducing a novel framework for analyzing functional magnetic resonance imaging data from a cohort comprising normal control...
Liu, Z.
•
Zhang, L.
biorxiv
Fri Jul 18 2025
VaxKG: Integrating The Vaccine Ontology And VIOLIN For Advanced Vaccine Queries And LLM-Powered Chat Systems
Vaccine research faces challenges in integrating diverse biomedical datasets. While the Vaccine Investigation and Online Information Network (VIOLIN) provides comprehensive vaccine data, implemented in traditional relational models limit complex analysis. Similarly, the Vaccine Ontology (VO) offers standardized semantic frameworks but lacks comprehensive empirical data. This study addresses these ...
Yeh, F.-Y.
•
Asato, M.
•
Zheng, J.
•
He, Y.
biorxiv
Fri Jul 18 2025
AtlasAgent: Vision language model and Agent-guided Framework for Evaluation of Atlas-scale Single-cell Integration
As single-cell omics transitions into the era of AI-virtual cells (AIVC), where large-scale single-cell data integration becomes prevalent, the computational demands of integration evaluation emerge as critical scalability bottlenecks. Traditional integration evaluation pipelines, requiring metrics like k-nearest-neighbor batch effect test (kBET) and Local Inverse Simpson\'s Index (iLISI) employed...
Yin, D.
•
Zhang, Z.
•
Liu, X.
•
Ni, K.
...•
Ho, J. W. K.
biorxiv
Fri Jul 18 2025
The MicrobeAtlas database: Global trends and insights into Earth's microbial ecosystems
Environmental DNA sequencing has revolutionized our understanding of microbial diversity and ecology. Microbiomes have now been sequenced across the entire planet - from the deep subsurface to the mountain tops - covering a myriad of hosts, biomes, and conditions. Yet, the diversity of sequencing and processing strategies hampers universal insights. MicrobeAtlas unifies more than two million micro...
Rodrigues, J. R. F. M.
•
Tackmann, J.
•
Malfertheiner, L.
•
Patsch, D.
...•
von Mering, C.