Back to Search Start Over

Machine Learning as an Effective Method for Identifying True Single Nucleotide Polymorphisms in Polyploid Plants

Authors :
Walid Korani
Josh P. Clevenger
Ye Chu
Peggy Ozias-Akins
Source :
The Plant Genome, Vol 12, Iss 1 (2019)
Publication Year :
2019
Publisher :
Wiley, 2019.

Abstract

Single nucleotide polymorphisms (SNPs) have many advantages as molecular markers since they are ubiquitous and codominant. However, the discovery of true SNPs in polyploid species is difficult. Peanut ( L.) is an allopolyploid, which has a very low rate of true SNP calling. A large set of true and false SNPs identified from the Axiom_ 58k array was leveraged to train machine-learning models to enable identification of true SNPs directly from sequence data to reduce ascertainment bias. These models achieved accuracy rates above 80% using real peanut RNA sequencing (RNA-seq) and whole-genome shotgun (WGS) resequencing data, which is higher than previously reported for polyploids and at least a twofold improvement for peanut. A 48K SNP array, Axiom_2, was designed using this approach resulting in 75% accuracy of calling SNPs from different tetraploid peanut genotypes. Using the method to simulate SNP variation in several polyploids, models achieved >98% accuracy in selecting true SNPs. Additionally, models built with simulated genotypes were able to select true SNPs at >80% accuracy using real peanut data. This work accomplished the objective to create an effective approach for calling highly reliable SNPs from polyploids using machine learning. A novel tool was developed for predicting true SNPs from sequence data, designated as SNP machine learning (SNP-ML), using the described models. The SNP-ML additionally provides functionality to train new models not included in this study for customized use, designated SNP machine learner (SNP-MLer). The SNP-ML is publicly available.

Details

Language :
English
ISSN :
19403372
Volume :
12
Issue :
1
Database :
Directory of Open Access Journals
Journal :
The Plant Genome
Publication Type :
Academic Journal
Accession number :
edsdoj.0dbd95ff923443daa45b5a77cd579a0b
Document Type :
article
Full Text :
https://doi.org/10.3835/plantgenome2018.05.0023