2025 Hyper Recent •CC0 1.0 Universal

This work is dedicated to the public domain. No rights reserved.

Access Preprint From Server
June 2nd, 2025
Version: 1
University of Toronto Mississauga
bioinformatics
biorxiv

Deviation Error: assessing machine learning predictions for replicate measurements in genomics and beyond

Abdulnabi, H.Open in Google Scholar•Westwood, J. T.Open in Google Scholar

A quantitative measurement can have variation, referred to here as measurement variation, which is a probability distribution. Machine Learning models typically produce a prediction corresponding to the mode of the measurement variation. The Deviation Error is a novel metric, described here, to assess predictions that accounts for measurement variation. Measurement variations in genomics data were explored. Towards a general prescription for modelling genomics measurements to reduce the Deviation Error, different loss functions were used to fit models on synthetically generated data that mimics genomics measurements. Compared to using the Mean Squared Error as the loss function, none of the other loss functions examined yielded models that performed significantly better. However, using variants of the Mean Squared Log Error and the Negative Log Likelihood as loss functions yielded models that performed significantly worse.

Similar Papers

biorxiv
Wed Jun 04 2025
De Novo sequencing-assisted homology search for DIA data analysis enables low abundance peptide variants discovery
Data-independent acquisition mass spectrometry (DIA-MS) has emerged as a powerful approach for comprehensive proteome profiling. Spectral library search and library-free search are the two major approaches for DIA data analysis. The spectral library search requires high-quality spectral libraries derived from the search results of data-dependent acquisition (DDA) experiments, while library-free ap...
Qiao, R.
•
Li, H.
•
Bian, H.
•
Xin, L.
•
Shan, B.
biorxiv
Wed Jun 04 2025
Cell-ECM Graphs: A Graph-Based Method for Joint Analysis of Cells and the Extracellular Matrix
The extracellular matrix (ECM) provides essential structural and biochemical support to tissues, yet its spatial organization is often underrepresented in computational analyses of spatial proteomics. We present Cell-ECM Graphs, a computational framework that integrates ECM and cellular components into a unified graph structure, enabling joint modelling of cell-cell, ECM-ECM, and cell-ECM interact...
Ghafoor, M.
•
Parkinson, J. E.
•
Sutherland, T. E.
•
Rattray, M.
biorxiv
Wed Jun 04 2025
Single-cell morphological profiling reveals insights into programmed cell death
Analysis at the single-cell level is a powerful approach to study biological processes and responses to perturbations. However, its application in morphological profiling with phenomics remains underexplored. Here, we use the Cell Painting assay to investigate morphological effects of 53 small molecule compounds, associated with six distinct programmed cell death mechanisms, across six concentrati...
Frey, B.
•
Holmberg, D.
•
Bystrom, P.
•
Bergman, E.
...•
Spjuth, O.
biorxiv
Wed Jun 04 2025
Pasta, an age-shift transcriptomic clock, maps the chemical and genetic determinants of aging and rejuvenation
As the prevalence of age-related diseases rises, understanding and modulating the aging process is becoming a priority. Transcriptomic aging clocks (TACs) hold great promise for this endeavor, yet most are hampered by platform or tissue specificity and limited accessibility. Here, we introduce Pasta, a robust and broadly applicable TAC based on a novel age-shift learning strategy. Pasta accurately...
Salignon, J.
•
Tsiokou, M.
•
Marques, P.
•
Rodriguez-Diaz, E.
...•
Riedel, C. G.
biorxiv
Wed Jun 04 2025
REnformer, a single-cell ATAC-seq predicting model to investigate open chromatin sites
Genome regulatory elements are fundamental to cellular identity and cell type specific gene expression. Understanding how the underlying genetic code is differentially utilised by different cell types is central to understanding human health and disease. To better understand how DNA encodes genome regulatory elements such as promoters, enhancers, and boundary elements, we leverage the Enformer gen...
Riva, S. G.
•
Sanders, E.
•
Wilson, T.
•
Stranieri, N.
...•
Hughes, J. R.
biorxiv
Wed Jun 04 2025
PROTRIDER: Protein abundance outlier detection from mass spectrometry-based proteomics data with a conditional autoencoder
Detection of gene regulatory aberrations enhances our ability to interpret the impact of inherited and acquired genetic variation for rare disease diagnostics and tumor characterization. While numerous methods for calling RNA expression outliers from RNA-sequencing data have been proposed, the establishment of protein expression outliers from mass spectrometry data is lacking. Here, we propose and...
Klaproth-Andrade, D.
•
Scheller, I.
•
Tsitsiridis, G.
•
Loipfinger, S.
...•
Gagneur, J.
biorxiv
Tue Jun 03 2025
BifurcatoR: A Framework for Revealing Clinically Actionable Signal in Variance Masquerading as Noise
Background: Disease heterogeneity is a persistent challenge in medicine, complicating both research and treatment. Standard analytical pipelines often assume patient populations are homogeneous, overlooking variance patterns that may signal biologically distinct subgroups. Variance heterogeneity (VH), including skewness, outliers, and multimodal distributions, offers a powerful but underused lens ...
Madaj, Z. B.
•
Ding, M.
•
Khoo, C. K.
•
Tokarski, E.
...•
Triche, T. J.
biorxiv
Tue Jun 03 2025
Competing Subclones and Fitness Diversity Shape Tumor Evolution Across Cancer Types
Intratumor heterogeneity arises from ongoing somatic evolution complicating cancer diagnosis, prognosis, and treatment. Here we present TEATIME (estimating evolutionary events through single-timepoint sequencing), a novel computational framework that models tumors as mixtures of two competing cell populations: an ancestral clone with baseline fitness and a derived subclone with elevated fitness. U...
Chen, H.
•
Shu, J.
•
Mudappathi, R.
•
Li, E.
...•
Liu, L.
biorxiv
Tue Jun 03 2025
A bio-informatics approach to identify new drug targets in multidrug-resistant bacteria
Antibiotic resistance poses a global health crisis. In order to develop new antibiotic agents, it is crucial to identify drug targets in multidrug-resistant bacteria. Criteria for such a target are an -helical, essential membrane protein, that is non-homologues with the human membrane proteome, and present across multiple bacterial species. Using a stepwise subtractive genomics approach, the membr...
Ramsden, I.
•
de Jong-Hoogland, D.
•
Chiam, A. J.
•
Ulmschneider, M. B.
biorxiv
Tue Jun 03 2025
Risk evaluation of newly emerging flu viruses based on genomic sequences and AI
The recent resurgence of highly pathogenic avian influenza H5N1 viruses in North America and Europe has heightened global concerns regarding potential influenza pandemics. Despite significant progress in the surveillance and prevention of emerging influenza viruses, effective tools for rapid and accurate risk assessment remain limited. Here, we present FluRisk, an innovative computational framewor...
Li, H.
•
Feng, Y.
•
Lu, C.
•
Fu, P.
...•
Peng, Y.