Back to Search Start Over

Identity and compatibility of reference genome resources

Authors :
Michał Stolarczyk
Nathan C. Sheffield
Bingjie Xue
Source :
NAR Genomics and Bioinformatics
Publication Year :
2021
Publisher :
Oxford University Press (OUP), 2021.

Abstract

Genome analysis relies on reference data like sequences, feature annotations, and aligner indexes. These data can be found in many versions from many sources, making it challenging to identify and assess compatibility among them. For example, how can you determine which indexes are derived from identical raw sequence files, or which annotations share a compatible coordinate system? Here, we describe a novel approach to establish identity and compatibility of reference genome resources. We approach this with three advances: First, we derive unique identifiers for each resource; second, we record parent-child relationships among resources; and third, we describe recursive identifiers that determine identity as well as compatibility of coordinate systems and sequence names. These advances facilitate portability, reproducibility, and re-use of genome reference data.Availabilityhttps://refgenie.databio.org

Details

ISSN :
26319268
Volume :
3
Database :
OpenAIRE
Journal :
NAR Genomics and Bioinformatics
Accession number :
edsair.doi.dedup.....9e3872f463011abd3e89bf9760738efd
Full Text :
https://doi.org/10.1093/nargab/lqab036