Back to Search Start Over

Tximeta: Reference sequence checksums for provenance identification in RNA-seq.

Authors :
Michael I Love
Charlotte Soneson
Peter F Hickey
Lisa K Johnson
N Tessa Pierce
Lori Shepherd
Martin Morgan
Rob Patro
Source :
PLoS Computational Biology, Vol 16, Iss 2, p e1007664 (2020)
Publication Year :
2020
Publisher :
Public Library of Science (PLoS), 2020.

Abstract

Correct annotation metadata is critical for reproducible and accurate RNA-seq analysis. When files are shared publicly or among collaborators with incorrect or missing annotation metadata, it becomes difficult or impossible to reproduce bioinformatic analyses from raw data. It also makes it more difficult to locate the transcriptomic features, such as transcripts or genes, in their proper genomic context, which is necessary for overlapping expression data with other datasets. We provide a solution in the form of an R/Bioconductor package tximeta that performs numerous annotation and metadata gathering tasks automatically on behalf of users during the import of transcript quantification files. The correct reference transcriptome is identified via a hashed checksum stored in the quantification output, and key transcript databases are downloaded and cached locally. The computational paradigm of automatically adding annotation metadata based on reference sequence checksums can greatly facilitate genomic workflows, by helping to reduce overhead during bioinformatic analyses, preventing costly bioinformatic mistakes, and promoting computational reproducibility. The tximeta package is available at https://bioconductor.org/packages/tximeta.

Subjects

Subjects :
Biology (General)
QH301-705.5

Details

Language :
English
ISSN :
1553734X and 15537358
Volume :
16
Issue :
2
Database :
Directory of Open Access Journals
Journal :
PLoS Computational Biology
Publication Type :
Academic Journal
Accession number :
edsdoj.9cc4aebd6e4be7bcb1f88ca81d32c9
Document Type :
article
Full Text :
https://doi.org/10.1371/journal.pcbi.1007664