Back to Search
Start Over
Estimating the information value of polymorphic sites using pooled sequences
- Source :
- BMC Genomics
- Publication Year :
- 2014
- Publisher :
- Springer Science and Business Media LLC, 2014.
-
Abstract
- High-throughput sequencing is a cost effective method for identifying genetic variation, and it is currently in use on a large scale across the field of biology, including ecology and population genetics. Correctly identifying variable sites and allele frequencies from sequencing data remains challenging, in large part due to artifacts and biases inherent in the sequencing process. Selecting variants that are diagnostic is commonly done using diversity statistics like F ST , but these measures are not ideal for the task. Here, we develop a method that directly calculates the expected amount of information gained from observing each variant site. We then develop and implement a conservative estimator that takes into account uncertainity introduced by sampling bias and sequencing error. This estimator is applied to simulated and real sequencing data, and we discuss how it performs compared to the commonly used existing methods for identifying diagnostic polymorphisms. The expected information content gives an easy to interpret measure for the usefulness of variant sites. The results show that we achieve a clear separation between true variants and noise, allowing us to select candidate sites with a high degree of confidence.
- Subjects :
- Population genetics
Datasets as Topic
diagnostic SNPs
Biology
computer.software_genre
Information theory
Polymorphism, Single Nucleotide
Field (computer science)
Gene Frequency
Genetics
Animals
information theory
Sampling bias
Polymorphism, Genetic
Research
Computational Biology
Estimator
Genomics
SNP identification
Identifying Variable
Noise (video)
Data mining
Scale (map)
expected site information
computer
Algorithms
Software
Biotechnology
Subjects
Details
- ISSN :
- 14712164
- Volume :
- 15
- Database :
- OpenAIRE
- Journal :
- BMC Genomics
- Accession number :
- edsair.doi.dedup.....0eb8e9c2c8c9457c2344d46a0d08c6eb
- Full Text :
- https://doi.org/10.1186/1471-2164-15-s6-s20