Back to Search
Start Over
Detecting sample swaps in diverse NGS data types using linkage disequilibrium
- Source :
- Nature Communications, Nature Communications, Vol 11, Iss 1, Pp 1-8 (2020)
- Publication Year :
- 2020
- Publisher :
- Nature Publishing Group UK, 2020.
-
Abstract
- As the number of genomics datasets grows rapidly, sample mislabeling has become a high stakes issue. We present CrosscheckFingerprints (Crosscheck), a tool for quantifying sample-relatedness and detecting incorrectly paired sequencing datasets from different donors. Crosscheck outperforms similar methods and is effective even when data are sparse or from different assays. Application of Crosscheck to 8851 ENCODE ChIP-, RNA-, and DNase-seq datasets enabled us to identify and correct dozens of mislabeled samples and ambiguous metadata annotations, representing ~1% of ENCODE datasets.<br />Parallelized analysis in clinical genomics can lead to sample or data mislabelling, and could have serious downstream consequences. Here the authors present a tool to quantify sample genetic relatedness and detect such mistakes, and apply it to thousands of datasets from the ENCODE consortium.
- Subjects :
- 0301 basic medicine
Linkage disequilibrium
Genotype
Computer science
Science
General Physics and Astronomy
Genomics
Sample (statistics)
Computational biology
computer.software_genre
ENCODE
Data type
General Biochemistry, Genetics and Molecular Biology
Article
Linkage Disequilibrium
03 medical and health sciences
0302 clinical medicine
Human Umbilical Vein Endothelial Cells
Humans
lcsh:Science
030304 developmental biology
Clinical genomics
0303 health sciences
Multidisciplinary
RNA
High-Throughput Nucleotide Sequencing
Molecular Sequence Annotation
General Chemistry
3. Good health
Metadata
030104 developmental biology
ComputingMethodologies_PATTERNRECOGNITION
HEK293 Cells
lcsh:Q
Data mining
Lod Score
Databases, Nucleic Acid
K562 Cells
computer
030217 neurology & neurosurgery
Software
Subjects
Details
- Language :
- English
- ISSN :
- 20411723
- Volume :
- 11
- Database :
- OpenAIRE
- Journal :
- Nature Communications
- Accession number :
- edsair.doi.dedup.....f104567353f888cf4a560f98566d6004