2025 Hyper Recent •CC0 1.0 Universal

This work is dedicated to the public domain. No rights reserved.

Access Preprint From Server
July 21st, 2025
Version: 7
Department of Life Sciences, Faculty of Agriculture, Ryukoku University
genetics
biorxiv

Applying gradient tree boosting to QTL mapping with Shapley additive explanations

Ishibashi, T.Open in Google Scholar•Onogi, A.Open in Google Scholar

Mapping quantitative trait loci (QTLs) is one of the major goals of quantitative genetics; however, identifying the interactions between QTLs (i.e., epistasis) remains challenging. Recently developed machine learning methods, such as deep learning and gradient boosting, are transforming the real world. These methods could advance QTL mapping methodologies because of their high capability for capturing complex relationships among features. One problem with applying such complex models to QTL mapping is the evaluation of feature importance. In this study, XGBoost, a popular gradient tree boosting algorithm, was applied for QTL mapping in biparental populations with Shapley additive explanations (SHAPs). SHAP is a local (i.e., instance-wise) importance index with the desired properties as feature importance indices. The SHAP-assisted XGBoost (SHAP-XGB) was compared with conventional methods, including composite interval mapping (CIM), multiple interval mapping (MIM), inclusive CIM (ICIM), and BayesC, using simulations and rice heading date data. SHAP-XGB performed comparablely to CIM, MIM, ICIM, and BayesC in mapping main QTL effects and was superior to MIM, ICIM, and BayesC in mapping QTL interaction effects. As SHAP can evaluate local importance, interactions between markers can be visualized by plotting SHAP interaction values for each instance (plant/line). These results illustrated the strength of SHAP-XGB in detecting and interpreting epistatic QTLs and suggest the possibility that SHAP-XGB complements conventional methods.

Similar Papers

biorxiv
Tue Jul 22 2025
Large Impact of Genetic Data Processing Steps on Stability and Reproducibility of Set-Based Analyses in Genome-Wide Association Studies
Genome-wide association studies (GWAS) are crucial to human genetics research, yet their stability and reproducibility are often questioned. This work describes, analyzes, and provides tools for overcoming reproducibility challenges in two highly popular components of GWAS: set-based (a) hypothesis testing and (b) effect size estimation. Specifically, we focus on how the set-based natures of (a) a...
Kui, N.
•
Yu, Y.
•
Choi, J.
•
McCaw, Z. R.
...•
Sun, R.
biorxiv
Mon Jul 21 2025
Systematic optimization of Caenorhabditis elegans cryopreservation
Caenorhabditis elegans (C. elegans) is a non-parasitic roundworm widely utilized as a versatile model organism for studying fundamental biological processes. Despite the availability of multiple cryopreservation methods, variations in the selection of developmental stage, cryoprotectant composition, and storage conditions may sometimes cause inconsistencies and uncertainty among researchers. In th...
Agrawal, S.
•
Karharia, A.
•
Rajendra Babu, K.
biorxiv
Mon Jul 21 2025
CAKUT variants in PRPF8, DYRK2, and CEP78: implications for splicing and ciliogenesis
Introduction: Congenital anomalies of the kidney and urinary tract (CAKUT) are the leading cause of chronic kidney disease in children and young adults. Although over 50 monogenic causes have been identified, many remain unresolved. PRPF8 is a core spliceosome component, essential for pre-mRNA splicing, and further localizes to the distal mother centriole to promote ciliogenesis. Methods: We perfo...
Merz, L. M.
•
Shril, S.
•
Carrocci, T. J.
•
Rezi, C. K.
...•
Hildebrandt, F.
biorxiv
Mon Jul 21 2025
Computer prediction and genetic analysis identifies retinoic acid modulation as a driver of conserved longevity pathways in genetically-diverse Caenorhabditis nematodes
Aging is a pan-metazoan process with significant consequences for human health and society--discovery of new compounds that ameliorate the negative health impacts of aging promise to be of tremendous benefit across a number of age-based comorbidities. One method to prioritize a testable subset of the nearly infinite universe of potential compounds is to use computational prediction of their likely...
Banse, S. A.
•
Sedore, C. A.
•
Coleman-Hulbert, A.
•
Johnson, E.
...•
Phillips, P. C.
biorxiv
Mon Jul 21 2025
BICC1 Interacts with PKD1 and PKD2 to Drive Cystogenesis in ADPKD
Autosomal dominant polycystic kidney disease (ADPKD) is primarily of adult-onset and caused by pathogenic variants in PKD1 or PKD2. Yet, disease expression is highly variable and includes very early-onset PKD presentations in utero or infancy. In animal models, the RNA-binding molecule Bicc1 has been shown to play a crucial role in the pathogenesis of PKD. To study the interaction between BICC1, P...
Tran, U.
•
Streets, A. J.
•
Smith, D.
•
Decker, E.
...•
Wessely, O.
biorxiv
Mon Jul 21 2025
What can Y-DNA analysis reveal about the surname Hay?
The family name Hay (plus associated spelling variants) is a prominent Anglo-Norman-in-origin surname that has been well-documented as a Scottish noble lineage since the 12th century CE. Their historical significance, linked to the rise of the Anglo-Norman era (1093-1286 CE) in Scotland, and the historical complexities of surname adoption post-Norman conquest of England, justifies the need for a c...
Stead, P.
•
Haddrill, P. R.
•
Macdonald, A. F.
biorxiv
Mon Jul 21 2025
Massively Parallel Polyribosome Profiling Reveals Translation Defects of Human Disease-Relevant UTR Mutations
The untranslated regions (UTRs) of mRNAs harbor regulatory elements influencing translation efficiency. Although 3.7% of disease-relevant human mutations occur in UTRs, their exact role in pathogenesis remains unclear. Through metagene analysis, we mapped pathogenic UTR mutations to regions near coding sequences, with a focus on the upstream open reading frame (uORF) initiation site. Subsequently,...
Li, W.-P.
•
Su, J.-Y.
•
Chang, Y.-C.
•
Wang, Y.-L.
...•
Lin, C.-L.
biorxiv
Mon Jul 21 2025
Genetic Modulation of Lifespan: Dynamic Effects, Sex Differences, and Body Weight Trade-offs
The dynamics of lifespan are shaped by DNA variants that exert effects at different ages. We have mapped genetic loci that modulate age-specific mortality using an actuarial approach. We started with an initial population of 6,438 pubescent siblings and ended with a survivorship of 559 mice that lived to at least 1100 days. Twenty-nine Vita loci dynamically modulate the mean lifespan of survivorsh...
Arends, D.
•
Ashbrook, D. G.
•
Roy, S.
•
Lu, L.
...•
Williams, R. W.
biorxiv
Mon Jul 21 2025
WISER: an innovative and efficient method for correcting population structure in omics-based prediction and selection
This work introduces WISER (whitening and successive least squares estimation refinement), an innovative and efficient method designed to enhance phenotype estimation by addressing population structure. WISER outperforms traditional methods such as least squares (LS) means and best linear unbiased prediction (BLUP) in phenotype estimation, offering a more accurate approach for omics-based selectio...
Jacquin, L.
•
Guerra, W.
•
Lewandowski, M.
•
Patocchi, A.
...•
Muranty, H.