Back to Search Start Over

Exploring Machine Learning Algorithms to Unveil Genomic Regions Associated With Resistance to Southern Root-Knot Nematode in Soybeans

Authors :
Caio Canella Vieira
Jing Zhou
Mariola Usovsky
Tri Vuong
Amanda D. Howland
Dongho Lee
Zenglu Li
Jianfeng Zhou
Grover Shannon
Henry T. Nguyen
Pengyin Chen
Source :
Frontiers in Plant Science, Vol 13 (2022)
Publication Year :
2022
Publisher :
Frontiers Media S.A., 2022.

Abstract

Southern root-knot nematode [SRKN, Meloidogyne incognita (Kofold & White) Chitwood] is a plant-parasitic nematode challenging to control due to its short life cycle, a wide range of hosts, and limited management options, of which genetic resistance is the main option to efficiently control the damage caused by SRKN. To date, a major quantitative trait locus (QTL) mapped on chromosome (Chr.) 10 plays an essential role in resistance to SRKN in soybean varieties. The confidence of discovered trait-loci associations by traditional methods is often limited by the assumptions of individual single nucleotide polymorphisms (SNPs) always acting independently as well as the phenotype following a Gaussian distribution. Therefore, the objective of this study was to conduct machine learning (ML)-based genome-wide association studies (GWAS) utilizing Random Forest (RF) and Support Vector Machine (SVM) algorithms to unveil novel regions of the soybean genome associated with resistance to SRKN. A total of 717 breeding lines derived from 330 unique bi-parental populations were genotyped with the Illumina Infinium BARCSoySNP6K BeadChip and phenotyped for SRKN resistance in a greenhouse. A GWAS pipeline involving a supervised feature dimension reduction based on Variable Importance in Projection (VIP) and SNP detection based on classification accuracy was proposed. Minor effect SNPs were detected by the proposed ML-GWAS methodology but not identified using Bayesian-information and linkage-disequilibrium Iteratively Nested Keyway (BLINK), Fixed and Random Model Circulating Probability Unification (FarmCPU), and Enriched Compressed Mixed Linear Model (ECMLM) models. Besides the genomic region on Chr. 10 that can explain most of SRKN resistance variance, additional minor effects SNPs were also identified on Chrs. 10 and 11. The findings in this study demonstrated that overfitting in GWAS may lead to lower prediction accuracy, and the detection of significant SNPs based on classification accuracy limited false-positive associations. The expansion of the basis of the genetic resistance to SRKN can potentially reduce the selection pressure over the major QTL on Chr. 10 and achieve higher levels of resistance.

Details

Language :
English
ISSN :
1664462X
Volume :
13
Database :
Directory of Open Access Journals
Journal :
Frontiers in Plant Science
Publication Type :
Academic Journal
Accession number :
edsdoj.1c3642c972614283bfbe0e5d7c01ee91
Document Type :
article
Full Text :
https://doi.org/10.3389/fpls.2022.883280