2025 Hyper Recent •CC0 1.0 Universal

This work is dedicated to the public domain. No rights reserved.

Access Preprint From Server
July 21st, 2025
Version: 1
INRAE
genetics
biorxiv

WISER: an innovative and efficient method for correcting population structure in omics-based prediction and selection

Jacquin, L.Open in Google Scholar•Guerra, W.Open in Google Scholar•Lewandowski, M.Open in Google Scholar•Patocchi, A.Open in Google Scholar•Rymenants, M.Open in Google Scholar•Durel, C.-E.Open in Google Scholar•Laurens, F.Open in Google Scholar•Lozano, L.Open in Google Scholar•Aranzana, M. J.Open in Google Scholar•Muranty, H.Open in Google Scholar

This work introduces WISER (whitening and successive least squares estimation refinement), an innovative and efficient method designed to enhance phenotype estimation by addressing population structure. WISER outperforms traditional methods such as least squares (LS) means and best linear unbiased prediction (BLUP) in phenotype estimation, offering a more accurate approach for omics-based selection and having the potential to improve association studies. Unlike existing approaches that correct for population structure, WISER provides a generalized framework applicable across diverse experimental setups, species, and omics datasets, including single nucleotide polymorphisms (SNPs), metabolomics, and near-infrared spectroscopy (NIRS) used as phenomic predictors. Central to WISER is the concept of whitening, a statistical transformation that removes correlations between variables and standardizes their variances. Within its framework, WISER extends classical methods that use eigen-information as fixed-effect covariates to correct for population structure, by relaxing their assumptions and implementing a true whitening matrix instead of a pseudo-whitening matrix. This approach corrects fixed effects (e.g., environmental effects) for the genetic covariance structure embedded within the experimental design, thereby minimizing confounding factors between fixed and genetic effects. To support its practical application, a user-friendly R package named wiser has been developed. The WISER method has been employed in analyses for genomic prediction and heritability estimation across four species and 33 traits using multiple datasets, including rice, maize, apple, and Scots pine. Results indicate that genomic predictive abilities based on WISER-estimated phenotypes consistently outperform the LS-means and BLUP approaches for phenotype estimation, regardless of the predictive model applied. This underscores WISER\'s potential to advance omics analyses and related research fields by capturing stronger genetic signals.

Similar Papers

biorxiv
Tue Jul 22 2025
Large Impact of Genetic Data Processing Steps on Stability and Reproducibility of Set-Based Analyses in Genome-Wide Association Studies
Genome-wide association studies (GWAS) are crucial to human genetics research, yet their stability and reproducibility are often questioned. This work describes, analyzes, and provides tools for overcoming reproducibility challenges in two highly popular components of GWAS: set-based (a) hypothesis testing and (b) effect size estimation. Specifically, we focus on how the set-based natures of (a) a...
Kui, N.
•
Yu, Y.
•
Choi, J.
•
McCaw, Z. R.
...•
Sun, R.
biorxiv
Mon Jul 21 2025
Systematic optimization of Caenorhabditis elegans cryopreservation
Caenorhabditis elegans (C. elegans) is a non-parasitic roundworm widely utilized as a versatile model organism for studying fundamental biological processes. Despite the availability of multiple cryopreservation methods, variations in the selection of developmental stage, cryoprotectant composition, and storage conditions may sometimes cause inconsistencies and uncertainty among researchers. In th...
Agrawal, S.
•
Karharia, A.
•
Rajendra Babu, K.
biorxiv
Mon Jul 21 2025
CAKUT variants in PRPF8, DYRK2, and CEP78: implications for splicing and ciliogenesis
Introduction: Congenital anomalies of the kidney and urinary tract (CAKUT) are the leading cause of chronic kidney disease in children and young adults. Although over 50 monogenic causes have been identified, many remain unresolved. PRPF8 is a core spliceosome component, essential for pre-mRNA splicing, and further localizes to the distal mother centriole to promote ciliogenesis. Methods: We perfo...
Merz, L. M.
•
Shril, S.
•
Carrocci, T. J.
•
Rezi, C. K.
...•
Hildebrandt, F.
biorxiv
Mon Jul 21 2025
Computer prediction and genetic analysis identifies retinoic acid modulation as a driver of conserved longevity pathways in genetically-diverse Caenorhabditis nematodes
Aging is a pan-metazoan process with significant consequences for human health and society--discovery of new compounds that ameliorate the negative health impacts of aging promise to be of tremendous benefit across a number of age-based comorbidities. One method to prioritize a testable subset of the nearly infinite universe of potential compounds is to use computational prediction of their likely...
Banse, S. A.
•
Sedore, C. A.
•
Coleman-Hulbert, A.
•
Johnson, E.
...•
Phillips, P. C.
biorxiv
Mon Jul 21 2025
BICC1 Interacts with PKD1 and PKD2 to Drive Cystogenesis in ADPKD
Autosomal dominant polycystic kidney disease (ADPKD) is primarily of adult-onset and caused by pathogenic variants in PKD1 or PKD2. Yet, disease expression is highly variable and includes very early-onset PKD presentations in utero or infancy. In animal models, the RNA-binding molecule Bicc1 has been shown to play a crucial role in the pathogenesis of PKD. To study the interaction between BICC1, P...
Tran, U.
•
Streets, A. J.
•
Smith, D.
•
Decker, E.
...•
Wessely, O.
biorxiv
Mon Jul 21 2025
What can Y-DNA analysis reveal about the surname Hay?
The family name Hay (plus associated spelling variants) is a prominent Anglo-Norman-in-origin surname that has been well-documented as a Scottish noble lineage since the 12th century CE. Their historical significance, linked to the rise of the Anglo-Norman era (1093-1286 CE) in Scotland, and the historical complexities of surname adoption post-Norman conquest of England, justifies the need for a c...
Stead, P.
•
Haddrill, P. R.
•
Macdonald, A. F.
biorxiv
Mon Jul 21 2025
Massively Parallel Polyribosome Profiling Reveals Translation Defects of Human Disease-Relevant UTR Mutations
The untranslated regions (UTRs) of mRNAs harbor regulatory elements influencing translation efficiency. Although 3.7% of disease-relevant human mutations occur in UTRs, their exact role in pathogenesis remains unclear. Through metagene analysis, we mapped pathogenic UTR mutations to regions near coding sequences, with a focus on the upstream open reading frame (uORF) initiation site. Subsequently,...
Li, W.-P.
•
Su, J.-Y.
•
Chang, Y.-C.
•
Wang, Y.-L.
...•
Lin, C.-L.
biorxiv
Mon Jul 21 2025
Genetic Modulation of Lifespan: Dynamic Effects, Sex Differences, and Body Weight Trade-offs
The dynamics of lifespan are shaped by DNA variants that exert effects at different ages. We have mapped genetic loci that modulate age-specific mortality using an actuarial approach. We started with an initial population of 6,438 pubescent siblings and ended with a survivorship of 559 mice that lived to at least 1100 days. Twenty-nine Vita loci dynamically modulate the mean lifespan of survivorsh...
Arends, D.
•
Ashbrook, D. G.
•
Roy, S.
•
Lu, L.
...•
Williams, R. W.
biorxiv
Mon Jul 21 2025
Applying gradient tree boosting to QTL mapping with Shapley additive explanations
Mapping quantitative trait loci (QTLs) is one of the major goals of quantitative genetics; however, identifying the interactions between QTLs (i.e., epistasis) remains challenging. Recently developed machine learning methods, such as deep learning and gradient boosting, are transforming the real world. These methods could advance QTL mapping methodologies because of their high capability for captu...
Ishibashi, T.
•
Onogi, A.