Exploration and generation of cell transcriptomes over deep evolutionary time
Whole-organism cell atlases have painted the cellular landscapes of individual species; however, comparing cells across the tree of life remains challenging. Most cross-species analyses are restricted to orthologous genes, while the complexity of atlas data constitutes an access barrier for many researchers. We developed a computational strategy to accelerate the exploration of cellular identities at scale. We integrated 30 atlases detailing the expression of 861,013 genes by 2,645,508 animal and plant cells and trained a universal model of cellular transcriptomes to track cell type diversification over evolutionary times. Transfer learning achieved cell type annotation of a de novo atlas of the insect Cryptocercus punctulatus within minutes. We devised a generative artificial intelligence approach to construct virtual cell atlases from genome sequences and used it to synthesise an atlas of the Tasmanian tiger, extinct since 1936. We then reduced the footprint of extant atlases by 100 times while retaining crucial information and developed interfaces to answer dozens of query types within seconds, boosting atlas exploration by ~50,000 times.