2025 Hyper Recent •CC0 1.0 Universal

This work is dedicated to the public domain. No rights reserved.

Access Preprint From Server
January 22nd, 2025
Version: 1
University of Aveiro
bioinformatics
biorxiv

Comparative evaluation of computational methods for reconstruction of human viral genomes

Sousa, M. J. P.Open in Google Scholar•Toppinen, M.Open in Google Scholar•Pyöriä, L.Open in Google Scholar•Hedman, K.Open in Google Scholar•Sajantila, A.Open in Google Scholar•Perdomo, M. F.Open in Google Scholar•Pratas, D.Open in Google Scholar

The increasing availability of viral sequences has led to the emergence of many optimized viral genome reconstruction tools. Given that the number of new tools is steadily increasing, it is complex to identify functional and optimized tools that offer an equilibrium between accuracy and computational resources as well as the features that each tool provides. In this paper, we surveyed open-source computational tools (including pipelines) used for human viral genome reconstruction, identifying specific characteristics, features, similarities, and dissimilarities between these tools. For quantitative comparison, we create an open-source reconstruction benchmark based on viral data. The benchmark was executed using both synthetic and real datasets. With the former, we evaluated the effects to the reconstruction process of using different human viruses with simulated mutation rates, contamination and mitochondrial DNA inclusion, and various coverage depths. Each reconstruction program was also evaluated using real datasets, demonstrating their performance in real-life scenarios. The evaluation measures include the identity, a Normalized Compression Semi-Distance, and the Normalized Relative Compression between the genomes before and after reconstruction, as well as metrics regarding the length of the genomes reconstructed, computational time and resources spent by each tool. The benchmark is fully reproducible and freely available at https://github.com/viromelab/HVRS.

Similar Papers

biorxiv
Thu May 08 2025
FlashFold: a standalone command-line tool for accelerated protein structure and stoichiometry prediction
ABSTARCTAlphaFold has revolutionized the decades-old issue of precisely predicting protein structures. However, its high accuracy relies on a computationally intensive step that involves searching vast databases for homologous sequences as the query protein of interest. Additionally, predicting the quaternary structure of protein complexes requires prior knowledge of subunit counts, a prerequisite...
Saha, C. K.
•
Roghanian, M.
•
Häussler, S.
•
Guy, L.
biorxiv
Thu May 08 2025
Not All Saliva Samples Are Equal: The Role of Cellular Heterogeneity in DNA methylation and Epigenetic Age Analyses with Biological and Psychosocial Factors
Saliva is widely used in biomedical population research, including epigenetic analyses to investigate gene-environment interplay and identify biomarkers. Its minimally invasive collection procedure makes it ideal for studies in pediatric populations. Saliva is a heterogenous tissue composed of immune and buccal epithelial cells (BEC). Amongst the many epigenetic marks, DNA methylation (DNAm) is th...
Chan, M. H.-M.
•
Meijer, M.
•
Merrill, S. M.
•
Fu, M. P. Y.
...•
Kobor, M. S.
biorxiv
Thu May 08 2025
AI-powered integration of multi-source data for TAA discovery to accelerate ADC and TCE drug development (I): TAA Target Identification and Prioritization
The advancement of T-cell engagers (TCEs) and antibody-drug conjugates (ADCs) has been hindered by fragmented data landscapes. This paper, the first in a series, introduces an AI-driven framework specifically for tumor-associated antigen (TAA) target identification and prioritization, a critical initial step in TCE and ADC development. Our framework integrates diverse datasets, including multi-omi...
Xie, T.
•
Huang, C.-H.
biorxiv
Thu May 08 2025
Surforama: interactive exploration of volumetric data by leveraging 3D surfaces
Motivation: Visualization and annotation of segmented surfaces is of paramount importance for studying membrane proteins in their native cellular environment by cryogenic electron tomography (cryo-ET). Yet, analyzing membrane proteins and their organization is challenging due to their small sizes and the need to consider local context constrained to the membrane surface. Results: To interactively ...
Yamauchi, K. A.
•
Lamm, L.
•
Gaifas, L.
•
Righetto, R. D.
...•
Harrington, K.
biorxiv
Thu May 08 2025
INLAomics for Scalable and Interpretable Spatial Multiomic Data Integration
Integrating spatial transcriptomics with antibody-based proteomics enables the investigation of biological regulation within intact tissue architecture. However, current approaches for spatial multi-omics integration often depend on dimensionality reduction or autoencoders, which disregard spatial context, limit interpretability, and face challenges with scalability. To address these limitations, ...
Arnroth, L.
•
Vickovic, S.
biorxiv
Thu May 08 2025
Predicting Molecular Taste: Multi-Label and Multi-Class Classification
Predicting the taste of chemical compounds is a complex task and has been a challenge for decades. This study explores the application of machine learning to predict taste profiles of chemical compounds using the ChemTastesDB dataset, comprising 2,944 tastants categorized into 44 taste labels and 9 taste classes. Addressing the challenges of label imbalance and correlation, the dataset was preproc...
Ramanathan, V.
•
DN, S. S.
biorxiv
Thu May 08 2025
A novel machine learning-based algorithm for eQTL identification reveals complex pleiotropic effects in the MHC region
Expression quantitative trait loci (eQTLs) are regulatory variants that affect the expression level of their target genes and have significant impact on disease biology. However, eQTL mapping has been done mostly in one tissue at a time, despite the known prevalence of correlations among tissues. Multivariate analyses incorporating multiple phenotypes are available, but they emphasize linear combi...
Li, R. Y.
•
Su, C.
•
Qin, Z. S.
biorxiv
Thu May 08 2025
Deep learning inference of miRNA expression from bulk and single-cell mRNA expression
Understanding the activity of miRNA in individual cells presents a challenge due to the limitations of single-cell technologies in capturing miRNAs. To tackle this obstacle, we introduce two deep learning models: Cross-Modality (CM) and Single-Modality (SM). These models utilize encoder-decoder architectures to predict miRNA expression at the bulk and single-cell levels from mRNA data. We compared...
Ripan, R. C.
•
Athaya, T.
•
Li, x.
•
Hu, H.
biorxiv
Thu May 08 2025
GeneFix-AI: AI-Powered CRISPR-Cas9 System for Real-Time Detection and Correction of Mutations in Non-Human Species
The evolution of genome engineering technologies has transformed biomedical research, enabling precise and efficient modification of genetic material Doudna and Charpentier, 2014. Among these, CRISPR-Cas9 stands out as a revolutionary gene-editing tool, though it often requires extensive expertise and technical knowledge Cong et al., 2013; J. G. Doench et al., 2016. We propose GeneFix-AI, an Artif...
Ali, M.
biorxiv
Thu May 08 2025
ORANGE: A Machine Learning Approach for Modeling Tissue-Specific Aging from Transcriptomic Data
Despite aging being a fundamental biological process which profoundly influences health and disease, the interplay between tissue-specific aging and mortality remains underexplored. This study applies machine learning on GTEx transcriptomic data to model tissue-specific biological ages across 12 different types of tissues and introduces an age-gap metric to quantify deviations from the chronologic...
Jalal, W.
•
Musarrat, M.
•
Samee, M. A. H.
•
Rahman, M. S.