April 7th, 2025
Version: 2
UNSW Sydney
evolutionary biology
biorxiv

Exploration and generation of cell transcriptomes over deep evolutionary time

Whole-organism cell atlases have painted the cellular landscapes of individual species; however, comparing cells across the tree of life remains challenging. Most cross-species analyses are restricted to orthologous genes, while the complexity of atlas data constitutes an access barrier for many researchers. We developed a computational strategy to accelerate the exploration of cellular identities at scale. We integrated 30 atlases detailing the expression of 861,013 genes by 2,645,508 animal and plant cells and trained a universal model of cellular transcriptomes to track cell type diversification over evolutionary times. Transfer learning achieved cell type annotation of a de novo atlas of the insect Cryptocercus punctulatus within minutes. We devised a generative artificial intelligence approach to construct virtual cell atlases from genome sequences and used it to synthesise an atlas of the Tasmanian tiger, extinct since 1936. We then reduced the footprint of extant atlases by 100 times while retaining crucial information and developed interfaces to answer dozens of query types within seconds, boosting atlas exploration by ~50,000 times.

Similar Papers

biorxiv
Fri Apr 11 2025
Complex genetic determinism of male-fertility restoration in the gynodioecious snail Physa acuta
Male fertility in plants is often controlled by the interaction between mitochondrial and nuclear genes. Some mitotypes confer cytoplasmic male sterility (CMS), making the individual male-sterile, unless the nuclear background contains alleles called restorers, that suppress the effects of CMS and restore the hermaphroditic condition. Restorers in cultivated crops are often alleles with strong and...
Skarlou, E.
Laugier, F.
Bethune, K.
Chenin, T.
...
David, P.
biorxiv
Fri Apr 11 2025
On the ancestry and evolution of the extinct dire wolf
Dire wolves (Aenocyon dirus) are extinct predators of Pleistocene North America. Although phenotypically similar to living wolves (Canis lupus), dire wolves have yet to be placed confidently in the canid family tree. We generated 3.4x and 12.8x paleogenomes from two well-preserved dire wolves dating to > 13,000 and > 72,000 years ago, and estimated consensus species trees for these and 10 canid sp...
Gedman, G.
Morrill Pirovich, K.
Oppenheimer, J.
Hyseni, C.
...
Shapiro, B.
biorxiv
Fri Apr 11 2025
Correlated evolution of the neck, head and forelimb across the theropod-bird transition
Powered flight has required birds to undergo numerous dramatic and coordinated evolutionary responses across the entire body, yet studies are limited to a small number of traits and often exclude a critical component of the vertebrate skeleton, the vertebrae. The neck is a critical region of the avian spine as it operates in tandem with the head as a surrogate forelimb across a diverse array of be...
Marek, R. D.
Felice, R. N.
biorxiv
Fri Apr 11 2025
Eukaryotes evade information storage-replication rate trade-off with endosymbiont assistance leading to larger genomes
Genome length varies widely among organisms, from compact genomes of prokaryotes to vast and complex genomes of eukaryotes. In this study, we theoretically identify the evolutionary pressures that may have driven this divergence in genome length. We use a parameter-free model to study genome length evolution under selection pressure to minimize replication time and maximize information storage cap...
Subramanian, H.
Sahu, P.
Barik, S.
Ghosh, K.
biorxiv
Fri Apr 11 2025
Economic analysis of disease and control of multi-field epidemics in agriculture
Epidemics of plant diseases are estimated to cause significant economic losses in crop production. Fungicide applications are widely used to control crop diseases but incur substantial indirect costs. One essential class of indirect costs arises due to the evolution of fungicide resistance. This indirect cost must be estimated reliably to design economic policy for more sustainable use of fungicid...
Mikaberidze, A.
Gokhale, C. S.
Bargues-Ribera, M.
Verma, P.
biorxiv
Thu Apr 10 2025
NALCN/Cch1 channelosome subunits originated in early eukaryotes and are fully conserved in animals, fungi, and apusomonads
The sodium leak channel NALCN, a key regulator of neuronal excitability, associates with three ancillary subunits that are critical for its function: an extracellular subunit called FAM155, and two cytoplasmic subunits called UNC79 and UNC80. Interestingly, NALCN and FAM155 have orthologous phylogenetic relationships with the fungal calcium channel Cch1 and its extracellular subunit Mid1, however,...
Senatore, A.
Mayorova, T. D.
Yanez Guerra, L. A.
Elkhatib, W.
...
Monteil, A.
biorxiv
Thu Apr 10 2025
Parallel erosion of a testis-specific Na+/K+ ATPase in three mammalian lineages sheds light into the evolution of spermatozoa energetics
Understanding how extant physiological landscapes arise from novel genetic interactions is key to elucidating phenotypic evolution. Sperm cells exemplify a striking case of functional compartmentalization shaped by molecular adjustments, notably regarding energy metabolism. Here, we examine the impact of gene duplication and loss on the evolution of sperm energetics in mammals. Our findings reveal...
Valente, R.
Machado, A.
Pericuesta, E.
Garca-Parraga, D.
...
Castro, F.
biorxiv
Thu Apr 10 2025
The Evolutionary Flexibility of the Drosophila Circadian Clock: Network Constraints or Adaptive Freedom?
The study of network evolution is critical to understanding how complex biological processes arise and adapt over time. Protein networks, composed of interacting components, can exhibit varying degrees of conservation and flexibility, enabling organ- isms to fine-tune their responses to environmental changes. Using the circadian clock system in Drosophila as a case study, we explore how such netwo...
Creasey, L. D.
Tauber, E.
biorxiv
Thu Apr 10 2025
Quartet-based Genome-scale Species Tree Inference using Multicopy Gene Family Trees
Species tree estimation from genome-wide data has transformed evolutionary studies, particularly in the presence of gene tree discordance. Gene trees often differ from species trees due to factors like incomplete lineage sorting (ILS) and gene duplication and loss (GDL). Quartet-based species tree estimation methods have gained substantial popularity for their accuracy and statistical guarantee. %...
Rafi, A.
Rumi, A. M. S.
Hakim, S. A.
Bayzid, M. S.
biorxiv
Thu Apr 10 2025
Aevol-9: A simulation platform to decipher the evolution of genome architecture
Aevol is a forward-in-time simulator of genome architecture. It simulates a population of individuals, each with an explicit genome whose sequence and architecture can be modified by various mutational operators, including substitutions, indels, and large-scale chromosomal rearrangements. This enables performing in silico experiments to decipher the effects of evolutionary conditions (e.g. populat...
Luiselli, J.
Parsons, D.
Galle, R.
Banse, P.
...
Beslon, G.