2025 Hyper Recent •CC0 1.0 Universal

This work is dedicated to the public domain. No rights reserved.

Access Preprint From Server
January 21st, 2025
Version: 5
University of Sydney
bioinformatics
biorxiv

Assessment of false discovery rate control in tandem mass spectrometry analysis using entrapment

Wen, B.Open in Google Scholar•Freestone, J. A.Open in Google Scholar•Riffle, M.Open in Google Scholar•MacCoss, M. J.Open in Google Scholar•Noble, W. S.Open in Google Scholar•Keich, U.Open in Google Scholar

A pressing statistical challenge in the field of mass spectrometry proteomics is how to assess whether a given software tool provides accurate error control. Each software tool for searching such data uses its own internally implemented methodology for reporting and controlling the error. Many of these software tools are closed source, with incompletely documented methodology, and the strategies for validating the error are inconsistent across tools. In this work, we identify three different methods for validating false discovery rate (FDR) control in use in the field, one of which is invalid, one of which can only provide a lower bound rather than an upper bound, and one of which is valid but under-powered. The result is that the field has a very poor understanding of how well we are doing with respect to FDR control, particularly for the analysis of data-independent acquisition (DIA) data. We therefore propose a theoretical formulation of entrapment experiments that allows us to rigorously characterize the behavior of the various entrapment methods. We also propose a more powerful method for evaluating FDR control, and we employ that method, along with other existing techniques, to characterize a variety of popular search tools. We empirically validate our entrapment analysis in the fairly well-understood DDA setup before applying it in the DIA setup. We find that none of the DIA search tools consistently controls the FDR at the peptide level, and the tools struggle particularly with analysis of single cell datasets.

Similar Papers

biorxiv
Mon Jun 30 2025
Genomic Touchstone: Benchmarking Genomic Language Models in the Context of the Central Dogma
The emergence of genomic language models (gLMs) has revolutionized the analysis of genomic sequences, enabling robust capture of biologically meaningful patterns from DNA sequences for an improved understanding of human genome-wide regulatory programs, variant pathogenicity and therapeutic discovery. Given that DNA serves as the foundational blueprint within the central dogma, the ultimate evaluat...
Wang, Y.
•
Cai, Z.
•
Zeng, Q.
•
Gao, Y.
...•
Chen, H.
biorxiv
Mon Jun 30 2025
STORIES: learning cell fate landscapes from spatial transcriptomics
In dynamic biological processes such as development, spatial transcriptomics is revolutionizing the study of the mechanisms underlying spatial organization within tissues. Inferring cell fate trajectories from spatial transcriptomics profiled at several time points has thus emerged as a critical goal, requiring novel computational methods. Wasserstein gradient flow learning is a promising framewor...
Huizing, G.-J.
•
Samaran, J.
•
Capocefalo, D.
•
Audit, A.
...•
Cantini, L.
biorxiv
Mon Jun 30 2025
Controllable Protein Design by Prefix-Tuning Protein Language Models
The design of novel proteins with tailored functionalities, particularly in drug discovery and vaccine development, presents a transformative approach to addressing pressing biomedical challenges. Inspired by the remarkable success of pre-trained language models in natural language processing (NLP), protein language models (ProtLMs) have emerged as powerful tools in advancing protein science. Whil...
Luo, J.
•
Liu, X.
•
Li, J.
•
Zhang, Y.
...•
Chen, J.
biorxiv
Mon Jun 30 2025
FEDRANN: effective long-read overlap detection based on dimensionality reduction and approximate nearest neighbors
Overlap detection is a key step in de novo genome assembly pipelines based on the Overlap-Layout-Consensus (OLC) paradigm. However, existing methods for overlap detection either rely on heuristic seed-and-extension strategies or locality-sensitive hashing (LSH), both of which struggle to handle repetitive genomic regions and the computational burden of large-scale datasets. Here, we present FEDRAN...
Zhang, J.-Y.
•
Miao, C.
•
Qiu, T.
•
Xia, X.
...•
Dong, Y.
biorxiv
Mon Jun 30 2025
Molecular characterization of unique multi-domain harbouring fungal rhodopsin for establishing their novel opto-synthetic biological usages
Organisms employ light as an external stimulus for regulating cellular functions. The light-sensitive photoreceptors detect light at varying wavelengths, activating signaling cascades and triggering a range of physiological responses. Rhodopsin is a transmembrane heptahelical protein that functions as an ion channel, or a pump, and sensory receptor, respectively. It consists of a light-sensing chr...
Kumari, A.
•
Kumar, A.
•
Sharma, K.
•
Pati, S. R.
...•
KATERIYA, S.
biorxiv
Mon Jun 30 2025
A Systematic Benchmark of High-Accuracy PacBio Long-Read RNA Sequencing for Transcript-Level Quantification
PacBio long-read RNA sequencing resolves transcripts with greater clarity than short-read technologies, yet its quantitative performance remains under-evaluated at scale. Here, we benchmark the high-throughput PacBio Kinnex platform against Illumina short-read RNA-seq using matched, deeply sequenced datasets across a time course of endothelial cell differentiation. Compared to Illumina, Kinnex ach...
Wissel, D.
•
Mehlferber, M. M.
•
Nguyen, K. M.
•
Pavelko, V.
...•
Sheynkman, G. M.
biorxiv
Mon Jun 30 2025
scHDeepInsight: A Hierarchical Deep Learning Framework for Precise Immune Cell Annotation in Single-Cell RNA-seq Data
Immune cell classification from single-cell RNA sequencing (scRNA-seq) presents significant challenges due to complex hierarchical relationships among cell types. We introduce scHDeepInsight, a deep learning framework that extends our previous scDeepInsight model by integrating a biologically-informed classification architecture with an adaptive hierarchical focal loss. The framework leverages our...
JIA, S.
•
Lysenko, A.
•
Boroevich, K. A.
•
Sharma, A.
•
Tsunoda, T.
biorxiv
Mon Jun 30 2025
reconcILS: A gene tree-species tree reconciliation algorithm that allows for incomplete lineage sorting
Reconciliation algorithms provide an accounting of the evolutionary history of individual gene trees given a species tree. Many reconciliation algorithms consider only duplication and loss events (and sometimes horizontal transfer), ignoring effects of the coalescent process, including incomplete lineage sorting (ILS). Here, we present a new algorithm for carrying out reconciliation that accuratel...
Mishra, S.
•
Smith, M. L.
•
Hahn, M. W.
biorxiv
Mon Jun 30 2025
Identifying Optimal Machine Learning Approaches for Microbiome-Metabolomics Integration with Stable Feature Selection
Microbiome research has been limited by methodological inconsistencies. Taxonomy-based profiling presents challenges such as data sparsity, variable taxonomic resolution, and the reliance on DNA-based profiling, which provides limited functional insight. Multi-omics integration has emerged as a promising approach to link microbiome composition with function. However, the lack of standardized metho...
Palmer, S. N.
•
Mishra, A. A.
•
Gan, S.
•
Liu, D.
...•
Zhan, X.
biorxiv
Mon Jun 30 2025
Cell type-specific functions of nucleic acid-binding proteins revealed by deep learning on co-expression networks
Nucleic acid-binding proteins (NABPs) exhibit cell type-specific regulatory functions, but their target genes and biological roles remain incompletely characterized due to the limitations of current experimental approaches. Here, we present a deep learning framework that integrates gene co-expression correlations to predict NABP regulatory targets and infer their functions across diverse cellular ...
Osato, N.
•
Sato, K.