2025 Hyper Recent •CC0 1.0 Universal

This work is dedicated to the public domain. No rights reserved.

Access Preprint From Server
January 21st, 2025
Version: 4
Department of Biological Sciences, Smith College, Northampton, MA, USA
evolutionary biology
biorxiv

Rethinking large scale phylogenomics with EukPhylo v1.0, a flexible toolkit to enable phylogeny-informed data curation and analyses of diverse eukaryotic lineages

Katz, L. A.Open in Google Scholar•Cote-L'Heureux, A. E.Open in Google Scholar•Leleu, M.Open in Google Scholar•Ani, G.Open in Google Scholar•Gawron, R.Open in Google Scholar

Eukaryotic diversity is largely microbial, with macroscopic lineages (plant, animals and fungi) nesting among a plethora of diverse protists. Understanding the evolutionary relationships among eukaryotes is rapidly advancing through omics analyses, but phylogenomics are challenging for microeukaryotes, particularly uncultivable lineages, as single-cell sequencing approaches generate a mixture of sequences from hosts, associated microbiomes, and contaminants. Moreover, many analyses of eukaryotic gene families and phylogenies rely on boutique datasets and methods that are challenging for other research groups to replicate. To address these challenges, we present EukPhylo v1.0, a modular, user-friendly pipeline that enables effective data curation through phylogeny-informed contamination removal, estimation of homologous gene families (GFs), and generation of both multisequence alignments and gene trees. Analyses can use a hook database of ~15k ancient GFs or users can easily replace this hook with a set of gene families of interest. We demonstrate the power of EukPhylo, including a suite of stand-alone utilities, through analyses of 500 conserved GFs sampled from 1,000 diverse species of eukaryotes, bacteria and archaea. We show improvements in estimates of the eukaryotic tree of life, recovering clades that are well established in the literature, through successive rounds of curation using the EukPhylo contamination loop. The final trees corroborate numerous hypotheses in the literature (e.g. Opisthokonta, Rhizaria, Amoebozoa) while challenging others (e.g. CRuMs, Obazoa, Diaphoretickes). We believe that the flexibility and transparency of EukPhylo sets standards for curation of omics data for future studies.

Similar Papers

biorxiv
Thu May 08 2025
Tracing the Spread of Celtic Languages using Ancient Genomics
Celtic languages, including Irish, Scottish Gaelic, Welsh and Breton, are today restricted to the Northern European Atlantic seaboard. However, between three and two thousand years before present (BP), Celtic was widely spoken across most of Europe before being largely replaced by Germanic, Latin or Slavic1-4. Despite this rich history, how Celtic spread across the European continent remains conte...
McColl, H.
•
Kroonen, G.
•
Pinotti, T.
•
Barrie, W.
...•
Willerslev, E.
biorxiv
Thu May 08 2025
Using single-cell genomics to explore transcriptional divergence and cis-regulatory dynamics of duplicated genes
Gene duplication is a major source of evolutionary innovation, enabling the emergence of novel expression patterns and functions. Leveraging single-cell genomics, we investigated the transcriptional dynamics and cis-regulatory evolution of duplicated genes in cultivated soybean (Glycine max), a species that has undergone two rounds of whole-genome duplication. Our analysis revealed extensive trans...
Li, X.
•
Zhang, X.
•
Schmitz, R. J.
biorxiv
Thu May 08 2025
Novel artificial selection method improves function of simulated microbial communities
There is increasing interest in artificially selecting or breeding microbial communities, but experiments have reported modest success. Here, we develop computational models to simulate two previously known selection methods and compare them to a new ``disassembly'' method. We evaluate all three methods in their ability to find a community that could efficiently degrade toxins, whereby investment ...
Vessman, B.
•
Guridi-Fernandez, P.
•
Arias-Sanchez, F. I.
•
Mitri, S.
biorxiv
Thu May 08 2025
Loss of Vertically-Inherited Totiviruses and Toxin-Encoding Satellites in Killer Yeast Evidences Intracellular Conflict in Natural Populations
Saccharomyces cerevisiae is occasionally infected by totiviruses and their toxin-encoding satellites. Totiviruses and their satellites coexist but with an asymmetric dependence on the totivirus for maintenance inside the host cell. Satellites provide their yeast hosts with inhibitory toxins and the necessary self-immunity; loss of the satellite equates to loss of immunity. Because mycoviruses lack...
Travers Cook, T.
•
Knight, S.
•
Lee, S.
•
Jucker, J.
...•
Buser, C.
biorxiv
Thu May 08 2025
No evidence for disassortative mating based on HLA genotype in a natural fertility population
Evidence for disassortative mating based on the human-specific MHC, i.e. HLA, is equivocal1. Initial evidence for disassortative HLA-pairing in the European-descent Hutterites2 has tended not to replicate in other populations. Recent studies, rather, reflect assortative mating associated with cosmopolitan population structure1. Although their configuration is more relevant to the majority of human...
Meeks, G. L.
•
Scelza, B.
•
Kichula, K. M.
•
Berevoescu, C.
...•
Henn, B. M.
biorxiv
Thu May 08 2025
Performance evaluation of adaptive introgression classification methods
Introgression, the incorporation of foreign variants through hybridization and repeated backcross, is increasingly being studied for its potential evolutionary consequences, one of which is adaptive introgression (AI). In recent years, several statistical methods have been proposed for the detection of loci that have undergone adaptive introgression. Most of these methods have been tested and deve...
Romieu, J.
•
Camarata, G.
•
Crochet, P.-A.
•
de Navascues, M.
...•
Rousset, F.
biorxiv
Thu May 08 2025
Emergent epistasis mediates the role of negative frequency-dependent selection in bacterial strain structure
Strain structure is a well-documented phenomenon in many pathogenic and commensal bacterial species, where distinct strains persist over time exhibiting stable associations between genetic or phenotypic traits. This structure is surprising, particularly in highly recombinogenic species like Streptococcus pneumoniae} because recombination typically breaks down linkage disequilibrium, the non-random...
Guillemet, M.
•
Lehtinen, S.
biorxiv
Thu May 08 2025
Identification of multivariate phenotypes most influenced by mutation: Drosophila serrata wings as a case study
The distribution of pleiotropic mutational effects impacts phenotypic adaptation. However, small effect sizes and high sampling error of covariances hinder investigations of the factors influencing this distribution. Here, we explored the potential for shared information across traits affected by the same mutations to counter sampling error, allowing robust characterisation of patterns of mutation...
McGuigan, K.
•
Conradsen, C.
biorxiv
Thu May 08 2025
Adaptive peak tracking as explanation of sparse fossil data across fluctuating ancient environments
Species that have existed over millions of years have done so because they have been able to track peaks in an adaptive landscape well enough to survive and reproduce. Such optima are defined by the mean phenotypic values that maximize mean fitness, and they are predominantly functions of the environment, for example the sea temperature. The mean phenotypic values over time will thus predominantly...
Ergon, R.
biorxiv
Thu May 08 2025
Using single-cell genomics to explore transcriptional divergence and cis-regulatory dynamics of duplicated genes
Gene duplication is a major source of evolutionary innovation, enabling the emergence of novel expression patterns and functions. Leveraging single-cell genomics, we investigated the transcriptional dynamics and cis-regulatory evolution of duplicated genes in cultivated soybean (Glycine max), a species that has undergone two rounds of whole-genome duplication. Our analysis revealed extensive trans...
Li, X.
•
Schmitz, R. J.
•
Zhang, X.