Back to Search
Start Over
Genetic Classification of Populations using Supervised Learning
- Source :
- PLoS ONE, Vol 6, Iss 5, p e14802 (2011), PLoS ONE
- Publication Year :
- 2010
- Publisher :
- arXiv, 2010.
-
Abstract
- There are many instances in genetics in which we wish to determine whether two candidate populations are distinguishable on the basis of their genetic structure. Examples include populations which are geographically separated, case--control studies and quality control (when participants in a study have been genotyped at different laboratories). This latter application is of particular importance in the era of large scale genome wide association studies, when collections of individuals genotyped at different locations are being merged to provide increased power. The traditional method for detecting structure within a population is some form of exploratory technique such as principal components analysis. Such methods, which do not utilise our prior knowledge of the membership of the candidate populations. are termed \emph{unsupervised}. Supervised methods, on the other hand are able to utilise this prior knowledge when it is available. In this paper we demonstrate that in such cases modern supervised approaches are a more appropriate tool for detecting genetic differences between populations. We apply two such methods, (neural networks and support vector machines) to the classification of three populations (two from Scotland and one from Bulgaria). The sensitivity exhibited by both these methods is considerably higher than that attained by principal components analysis and in fact comfortably exceeds a recently conjectured theoretical limit on the sensitivity of unsupervised methods. In particular, our methods can distinguish between the two Scottish populations, where principal components analysis cannot. We suggest, on the basis of our results that a supervised learning approach should be the method of choice when classifying individuals into pre-defined populations, particularly in quality control for large scale genome wide association studies.<br />Comment: Accepted PLOS One
- Subjects :
- Population genetics
Genome-wide association study
computer.software_genre
Quantitative Biology - Quantitative Methods
0302 clinical medicine
Bulgaria
Genetics and Genomics/Genetics of Disease
Quantitative Methods (q-bio.QM)
Genetics
Principal Component Analysis
0303 health sciences
education.field_of_study
Multidisciplinary
Artificial neural network
Principal component analysis
Medicine
Research Article
Science
Population
Genetic differences
Genetics and Genomics/Complex Traits
Biology
Machine learning
Polymorphism, Single Nucleotide
03 medical and health sciences
Genetics and Genomics/Population Genetics
Humans
Learning
education
QH426
030304 developmental biology
business.industry
Supervised learning
R1
Support vector machine
Genetics, Population
Scotland
Case-Control Studies
FOS: Biological sciences
Artificial intelligence
Computational Biology/Population Genetics
Nerve Net
Scale (map)
business
computer
030217 neurology & neurosurgery
Genome-Wide Association Study
Subjects
Details
- ISSN :
- 19326203
- Database :
- OpenAIRE
- Journal :
- PLoS ONE, Vol 6, Iss 5, p e14802 (2011), PLoS ONE
- Accession number :
- edsair.doi.dedup.....4ed9ed08e10bc6bcee8c55263981486a
- Full Text :
- https://doi.org/10.48550/arxiv.1012.3555