Deep learning has made strides in modeling protein sequences but often struggles to generalize beyond its training distribution. Current models focus on learning individual sequences through masked language modeling, but effective protein sequence analysis demands the ability to reason across sequences, a critical step in phylogenetic analysis. Training biological foundation models explicitly for inter-sequence reasoning could enhance their generalizability and performance for phylogenetic inference and other tasks in computational biology. Here, we report an ongoing development of Phyla, an architecture that operates on an explicit, higher-level semantic representation of phylogenetic trees. Phyla employs a hybrid state-space transformer architecture and a novel tree loss function to achieve state-of-the-art performance on sequence reasoning benchmarks and phylogenetic tree reconstruction. To validate Phyla\'s capabilities, we applied it to reconstruct the tree of life, where Phyla accurately reclassified archaeal organisms, such as Lokiarchaeota, as more closely related to bacteria-aligning with recent phylogenetic insights. Phyla represents a step toward molecular sequence reasoning, emphasizing structured reasoning over memorization and advancing protein sequence analysis and phylogenetic inference.