Back to Search Start Over

Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes

Authors :
Rostam Abdollahi-Arpanahi
Daniel Gianola
Francisco Peñagaricano
Source :
Genetics Selection Evolution, Vol 52, Iss 1, Pp 1-15 (2020)
Publication Year :
2020
Publisher :
BMC, 2020.

Abstract

Abstract Background Transforming large amounts of genomic data into valuable knowledge for predicting complex traits has been an important challenge for animal and plant breeders. Prediction of complex traits has not escaped the current excitement on machine-learning, including interest in deep learning algorithms such as multilayer perceptrons (MLP) and convolutional neural networks (CNN). The aim of this study was to compare the predictive performance of two deep learning methods (MLP and CNN), two ensemble learning methods [random forests (RF) and gradient boosting (GB)], and two parametric methods [genomic best linear unbiased prediction (GBLUP) and Bayes B] using real and simulated datasets. Methods The real dataset consisted of 11,790 Holstein bulls with sire conception rate (SCR) records and genotyped for 58k single nucleotide polymorphisms (SNPs). To support the evaluation of deep learning methods, various simulation studies were conducted using the observed genotype data as template, assuming a heritability of 0.30 with either additive or non-additive gene effects, and two different numbers of quantitative trait nucleotides (100 and 1000). Results In the bull dataset, the best predictive correlation was obtained with GB (0.36), followed by Bayes B (0.34), GBLUP (0.33), RF (0.32), CNN (0.29) and MLP (0.26). The same trend was observed when using mean squared error of prediction. The simulation indicated that when gene action was purely additive, parametric methods outperformed other methods. When the gene action was a combination of additive, dominance and of two-locus epistasis, the best predictive ability was obtained with gradient boosting, and the superiority of deep learning over the parametric methods depended on the number of loci controlling the trait and on sample size. In fact, with a large dataset including 80k individuals, the predictive performance of deep learning methods was similar or slightly better than that of parametric methods for traits with non-additive gene action. Conclusions For prediction of traits with non-additive gene action, gradient boosting was a robust method. Deep learning approaches were not better for genomic prediction unless non-additive variance was sizable.

Details

Language :
German, English, French
ISSN :
12979686
Volume :
52
Issue :
1
Database :
Directory of Open Access Journals
Journal :
Genetics Selection Evolution
Publication Type :
Academic Journal
Accession number :
edsdoj.fe1aa84e32b8471ea44c96bdb0e12236
Document Type :
article
Full Text :
https://doi.org/10.1186/s12711-020-00531-z