Back to Search Start Over

Detecting sample swaps in diverse NGS data types using linkage disequilibrium

Authors :
Charles B. Epstein
Timothy Fennell
Yossi Farjoun
Noam Shoresh
Bradley E. Bernstein
Nauman M. Javed
Source :
Nature Communications, Nature Communications, Vol 11, Iss 1, Pp 1-8 (2020)
Publication Year :
2020
Publisher :
Nature Publishing Group UK, 2020.

Abstract

As the number of genomics datasets grows rapidly, sample mislabeling has become a high stakes issue. We present CrosscheckFingerprints (Crosscheck), a tool for quantifying sample-relatedness and detecting incorrectly paired sequencing datasets from different donors. Crosscheck outperforms similar methods and is effective even when data are sparse or from different assays. Application of Crosscheck to 8851 ENCODE ChIP-, RNA-, and DNase-seq datasets enabled us to identify and correct dozens of mislabeled samples and ambiguous metadata annotations, representing ~1% of ENCODE datasets.<br />Parallelized analysis in clinical genomics can lead to sample or data mislabelling, and could have serious downstream consequences. Here the authors present a tool to quantify sample genetic relatedness and detect such mistakes, and apply it to thousands of datasets from the ENCODE consortium.

Details

Language :
English
ISSN :
20411723
Volume :
11
Database :
OpenAIRE
Journal :
Nature Communications
Accession number :
edsair.doi.dedup.....f104567353f888cf4a560f98566d6004