Back to Search
Start Over
Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes
- Source :
- Genetics Selection Evolution, Vol 52, Iss 1, Pp 1-15 (2020)
- Publication Year :
- 2020
- Publisher :
- BMC, 2020.
-
Abstract
- Abstract Background Transforming large amounts of genomic data into valuable knowledge for predicting complex traits has been an important challenge for animal and plant breeders. Prediction of complex traits has not escaped the current excitement on machine-learning, including interest in deep learning algorithms such as multilayer perceptrons (MLP) and convolutional neural networks (CNN). The aim of this study was to compare the predictive performance of two deep learning methods (MLP and CNN), two ensemble learning methods [random forests (RF) and gradient boosting (GB)], and two parametric methods [genomic best linear unbiased prediction (GBLUP) and Bayes B] using real and simulated datasets. Methods The real dataset consisted of 11,790 Holstein bulls with sire conception rate (SCR) records and genotyped for 58k single nucleotide polymorphisms (SNPs). To support the evaluation of deep learning methods, various simulation studies were conducted using the observed genotype data as template, assuming a heritability of 0.30 with either additive or non-additive gene effects, and two different numbers of quantitative trait nucleotides (100 and 1000). Results In the bull dataset, the best predictive correlation was obtained with GB (0.36), followed by Bayes B (0.34), GBLUP (0.33), RF (0.32), CNN (0.29) and MLP (0.26). The same trend was observed when using mean squared error of prediction. The simulation indicated that when gene action was purely additive, parametric methods outperformed other methods. When the gene action was a combination of additive, dominance and of two-locus epistasis, the best predictive ability was obtained with gradient boosting, and the superiority of deep learning over the parametric methods depended on the number of loci controlling the trait and on sample size. In fact, with a large dataset including 80k individuals, the predictive performance of deep learning methods was similar or slightly better than that of parametric methods for traits with non-additive gene action. Conclusions For prediction of traits with non-additive gene action, gradient boosting was a robust method. Deep learning approaches were not better for genomic prediction unless non-additive variance was sizable.
- Subjects :
- Animal culture
SF1-1100
Genetics
QH426-470
Subjects
Details
- Language :
- German, English, French
- ISSN :
- 12979686
- Volume :
- 52
- Issue :
- 1
- Database :
- Directory of Open Access Journals
- Journal :
- Genetics Selection Evolution
- Publication Type :
- Academic Journal
- Accession number :
- edsdoj.fe1aa84e32b8471ea44c96bdb0e12236
- Document Type :
- article
- Full Text :
- https://doi.org/10.1186/s12711-020-00531-z