Start Over

Evaluating the quality of the 1000 genomes project data

Authors :: Andrew S. Peterson
Subhra Chaudhuri
Michal Sakin-Levy
Jeffrey D. Wall
Ming Xiao
Pui-Yan Kwok
Steffen Durinck
Yulia Mostovoy
Saurabh Belsare
Somasekar Seshagiri
Source :: BMC genomics, vol 20, iss 1, BMC Genomics, Vol 20, Iss 1, Pp 1-14 (2019), BMC Genomics
Publication Year :: 2019
Publisher :: eScholarship, University of California, 2019.
Abstract: Background Data from the 1000 Genomes project is quite often used as a reference for human genomic analysis. However, its accuracy needs to be assessed to understand the quality of predictions made using this reference. We present here an assessment of the genotyping, phasing, and imputation accuracy data in the 1000 Genomes project. We compare the phased haplotype calls from the 1000 Genomes project to experimentally phased haplotypes for 28 of the same individuals sequenced using the 10X Genomics platform. Results We observe that phasing and imputation for rare variants are unreliable, which likely reflects the limited sample size of the 1000 Genomes project data. Further, it appears that using a population specific reference panel does not improve the accuracy of imputation over using the entire 1000 Genomes data set as a reference panel. We also note that the error rates and trends depend on the choice of definition of error, and hence any error reporting needs to take these definitions into account. Conclusions The quality of the 1000 Genomes data needs to be considered while using this database for further studies. This work presents an analysis that can be used for these assessments. Electronic supplementary material The online version of this article (10.1186/s12864-019-5957-x) contains supplementary material, which is available to authorized users.

Subjects :: 0106 biological sciences
Computer science
computer.software_genre
01 natural sciences
Medical and Health Sciences
0302 clinical medicine
Gene Frequency
Population specific
Human Genome Project
Imputation (statistics)
0303 health sciences
Genome
Continental Population Groups
Phasing
High-Throughput Nucleotide Sequencing
Single Nucleotide
Biological Sciences
Scientific Experimental Error
Data mining
Research Article
Biotechnology
Human
lcsh:QH426-470
Bioinformatics
Data needs
lcsh:Biotechnology
Genomics
Biology
Polymorphism, Single Nucleotide
03 medical and health sciences
1000 genomes
lcsh:TP248.13-248.65
Information and Computing Sciences
Error reporting
Genetics
Humans
1000 Genomes Project
Polymorphism
Genotyping
030304 developmental biology
Imputation
Genome, Human
Racial Groups
Human Genome
lcsh:Genetics
Haplotypes
Sample size determination
genomes
computer
030217 neurology & neurosurgery
Imputation (genetics)
010606 plant biology & botany

Details

Database :: OpenAIRE
Journal :: BMC genomics, vol 20, iss 1, BMC Genomics, Vol 20, Iss 1, Pp 1-14 (2019), BMC Genomics
Accession number :: edsair.doi.dedup.....92b906a19b0d090fb8438979a51dfc78

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Evaluating the quality of the 1000 genomes project data

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Evaluating the quality of the 1000 genomes project data

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources