2025 Hyper Recent •CC0 1.0 Universal

This work is dedicated to the public domain. No rights reserved.

Access Preprint From Server
July 1st, 2025
Version: 3
University of Virginia
systems biology
biorxiv

Uncovering the domain language of protein functionality and cell phenotypes using DANSy

Shimpi, A. A.Open in Google Scholar•Naegle, K. M.Open in Google Scholar

Evolution has developed a set of principles that determine feasible domain combinations, analogous to grammar within natural languages. Treating domains as words and proteins as sentences, made up of domain words, we apply a linguistic approach to represent the human proteome as an n-gram network, which we call hereafter as Domain Architecture Network Syntax (DANSy). Combining DANSy with network theory, we explore the abstract rules of domain word combinations within the human proteome and identify connections that determine feasible protein functionality. We analyze the entropic information content of these domain word connections to establish a DANSy network that balances recovering most of proteome with n-gram complexity. Additionally, we explored subnetwork languages by focusing on reversible post-translational modifications (PTMs) systems that follow a reader-writer-eraser paradigm. We find that PTM systems appear to sample grammar rules near the onset of the system expansion, but then converge towards similar grammar rules, which stabilize during the post-metazoan switch. For example, reader and writer domains are typically tightly connected through shared n-grams, but eraser domains are almost always loosely or completely disconnected from readers and writers. Additionally, after grammar fixation, domains with verb-like properties, such as writers and erasers, never appear together -- consistent with the idea of natural grammar that leads to clarity and limits futile enzymatic cycles. Given how some cancer fusion genes represent the possibility for the emergence of novel language, we investigate how cancer fusion genes alter the human proteome n-gram network. We find most cancer fusion genes follow existing grammar rules. Finally, we adapt our DANSy analysis for differential expression (deDANSy) analysis to determine the relationship of coordinated changes in domain language syntax to cell phenotypes. We applied deDANSy to RNA-sequencing data from SOX10-deficient melanoma cells, finding that we can use network separation and syntax enrichment to characterize the molecular basis of cell phenotypes and identify novel information distinct from gene set enrichment analysis (GSEA) approaches. Collectively, these results suggest that n-gram based analysis of proteomes is a complement to direct protein interaction approaches, is more fully described than protein-protein interaction networks, and can be used to provide unique insights for signaling pathway enrichment analysis.

Similar Papers

biorxiv
Fri Jul 04 2025
Establishment of menthol bleaching protocols for six stony coral species
The mutualistic symbiosis between stony corals and unicellular algae of the family Symbiodiniaceae forms the base of coral reef ecosystems. However, anthropogenic stressors, such as rising seawater temperatures, cause a breakdown of the coral-algal symbiosis, so-called coral bleaching, which leads to mass mortalities and a rapid loss of coral reefs. To functionally disassemble the coral-algal symb...
Bauer, L.
•
Ferrara, E. F.
•
Puntin, G.
•
Paulus, A.-L.
...•
Ziegler, M.
biorxiv
Fri Jul 04 2025
Temporal tuning of switch-like virulence expression resolves environmental uncertainty through phenotypic heterogeneity
Gene regulatory networks often evolve in the face of environmental uncertainty, as stimuli are rarely precise and uniform enough to make all or nothing responses advantageous. Virulence gene regulation in intracellular bacterial pathogens is shaped by unique selective pressures to resolve this uncertainty, as host environments are dynamic, hostile, and heterogeneous. Here, we investigate the regul...
Spratt, M.
•
Lane, K.
biorxiv
Fri Jul 04 2025
SpatioEv: Spatial evolution of protein and morphological features reveals development dynamics of cells and spatial neighbourhoods
Understanding cellular function in tissues demands sophisticated tools to decode complex microenvironmental interactions. Current spatial analysis methods often lack the comprehensive framework needed to systematically analyse cell morphology, dynamics, interactions, and extracellular matrix (ECM) architecture. We introduce SpatioEv, a unified computational framework for highly multiplexed tissue ...
Wu, S.
•
Amin, S.
•
Lee, C.
•
Richard, J.-B.
...•
Bashford-Rogers, R.
biorxiv
Fri Jul 04 2025
Stool Dynamics and the Developing Gut Microbiome During Infancy
The infant gut microbiome is a dynamic ecosystem that plays a crucial role in early development, influences immune system maturation, and overall health. Recent insights reveal that the gut microbiota undergoes changes across the 24-h day, raising the possibility that it may act as zeitgeber, helping to regulate the host\'s sleep-wake organisation. Despite its importance, timing factors influencin...
Al-Andoli, M.
•
Schoch, S.
•
Markovic, A.
•
Muhlematter, C.
...•
Kurth, S.
biorxiv
Fri Jul 04 2025
Targeting the amino acid metabolism in Lung Sarcopenia: A Systems Engineering Approach
Non-small cell lung cancer (NSCLC) is the most prevalent subtype of lung cancer and a leading cause of cancer-related mortality worldwide. Literature evidences indicates a strong association between systemic inflammation, driven by cytokines such as Interleukin-6 (IL-6), and the development of NSCLC-associated sarcopenia. However, the immuno-metabolic underpinnings that link tumor-derived IL-6 sig...
Kumar, G.
•
Khandibharad, S.
•
Singh, S.
biorxiv
Fri Jul 04 2025
Predicting the protein interaction landscape of a free-living bacterium with pooled-AlphaFold3
Accurate prediction of protein complex structures by AlphaFold3 and similar programs has been successfully used to predict the presence of protein-protein interactions (PPIs), but this technique has never been applied to an entire genome due to onerous computational requirements. Here we present pooled-PPI prediction, a technique that reduces the inference time of genome-scale screens ~2-fold and ...
Todor, H.
•
Kim, L. M.
•
Burkhart, H. N.
•
Darst, S. A.
...•
Gross, C. A.
biorxiv
Thu Jul 03 2025
Bayesian data driven modelling of kinetochore dynamics: space-time organisation of the human metaphase plate
Mitosis is a complex self-organising process that achieves high fidelity separation of duplicated chromosomes into two daughter cells through capture and alignment of chromosomes to the spindle mid-plane. Chromosome movements are driven by kinetochores, multi-protein machines that attach chromosomes to microtubules (MTs), both controlling and generating directional forces. Using lattice light shee...
Koki, C.
•
Inchingolo, A. V.
•
Daniyan, A.
•
Li, E.
...•
Burroughs, N. J.
biorxiv
Wed Jul 02 2025
A global genetic interaction map of a human cell reveals conserved principles of genetic networks
We generated a genome-scale, genetic interaction network from the analysis of more than 4 million double mutants in the haploid human cell line, HAP1. The network maps ~90,000 genetic interactions, including thousands of extreme synthetic lethal and genetic suppression interactions. Genetic interaction profiles enabled assembly of a hierarchical model of cell function, including modules correspond...
Billmann, M.
•
Costanzo, M.
•
Zhang, X.
•
Hassan, A. Z.
...•
Li
biorxiv
Tue Jul 01 2025
Fluctuation structure predicts genome-wide perturbation outcomes
Pooled single-cell perturbation screens represent powerful experimental platforms for functional genomics, yet interpreting these rich datasets for meaningful biological conclusions remains challenging. Most current methods fall at one of two extremes: either opaque deep learning models that obscure biological meaning, or simplified frameworks that treat genes as isolated units. As such, these app...
Kuznets-Speck, B.
•
Schwartz, L.
•
Sun, H.
•
Melzer, M. E.
...•
Goyal, Y.