Back to Search
Start Over
Identity and compatibility of reference genome resources
- Source :
- NAR Genomics and Bioinformatics
- Publication Year :
- 2021
- Publisher :
- Oxford University Press (OUP), 2021.
-
Abstract
- Genome analysis relies on reference data like sequences, feature annotations, and aligner indexes. These data can be found in many versions from many sources, making it challenging to identify and assess compatibility among them. For example, how can you determine which indexes are derived from identical raw sequence files, or which annotations share a compatible coordinate system? Here, we describe a novel approach to establish identity and compatibility of reference genome resources. We approach this with three advances: First, we derive unique identifiers for each resource; second, we record parent-child relationships among resources; and third, we describe recursive identifiers that determine identity as well as compatibility of coordinate systems and sequence names. These advances facilitate portability, reproducibility, and re-use of genome reference data.Availabilityhttps://refgenie.databio.org
- Subjects :
- AcademicSubjects/SCI01140
AcademicSubjects/SCI01060
Computer science
AcademicSubjects/SCI00030
AcademicSubjects/SCI01180
Genome
Unique identifier
03 medical and health sciences
Software portability
0302 clinical medicine
Resource (project management)
Structural Biology
Genetics
Molecular Biology
030304 developmental biology
0303 health sciences
Information retrieval
Applied Mathematics
APP Notes
Computer Science Applications
Identifier
Reference data
Identity (object-oriented programming)
AcademicSubjects/SCI00980
030217 neurology & neurosurgery
Reference genome
Subjects
Details
- ISSN :
- 26319268
- Volume :
- 3
- Database :
- OpenAIRE
- Journal :
- NAR Genomics and Bioinformatics
- Accession number :
- edsair.doi.dedup.....9e3872f463011abd3e89bf9760738efd
- Full Text :
- https://doi.org/10.1093/nargab/lqab036