Back to Search
Start Over
GenoVault: a cloud based genomics repository
- Source :
- BioData Mining, Vol 14, Iss 1, Pp 1-10 (2021), BioData Mining
- Publication Year :
- 2021
- Publisher :
- BMC, 2021.
-
Abstract
- GenoVault is a cloud-based repository for handling Next Generation Sequencing (NGS) data. It is developed using OpenStack based private cloud with various services like keystone for authentication, cinder for block storage, neutron for networking and nova for managing compute instances for the Cloud. GenoVault uses object-based storage, which enables data to be stored as objects instead of files or blocks for faster retrieval from different distributed object nodes. Along with a web-based interface JavaFX-based desktop client has also been developed to meet the requirements of large file uploads ( > 5 GB) that is usually seen in NGS datasets. Users can store as many as one million files in their respective object based storage areas and the metadata provided by the user during file uploads is used for querying the database. GenoVault repository is designed taking into account future needs and hence can scale both vertically and horizontally without any need for modification in the design. Users have an option to make the data shareable to the public or restrict the access as private. Data security is ensured as every container is a separate entity in object-based storage architecture also supported by secured file transfer protocol during data upload and download. The data is uploaded by the user in their individual containers that include raw read files (fastq), processed alignment files (bam, sam, bed) and output of variation detection (vcf). GenoVault architecture allows verification of the data in terms of integrity and authentication before making it available to collaborators as per user permissions. GenoVault is useful for maintaining the organization wide NGS data generated by experiments in various labs which is not yet published and submitted to public repositories like NCBI. GenoVault also provides support to share NGS data among the collaborating institutions. GenoVault can thus manage vast volumes of NGS data on any OpenStack-based private cloud.
- Subjects :
- Computer science
Computer applications to medicine. Medical informatics
R858-859.7
Data security
Cloud computing
computer.software_genre
Biochemistry
03 medical and health sciences
Upload
0302 clinical medicine
Genetics
Molecular Biology
030304 developmental biology
0303 health sciences
QA299.6-433
Database
business.industry
Distributed object
Object (computer science)
Software Article
Computer Science Applications
Secure File Transfer Protocol
Metadata
Computational Mathematics
OpenStack
Computational Theory and Mathematics
Genomics repository
030220 oncology & carcinogenesis
NGS
Container (abstract data type)
business
computer
Cloud
Analysis
Subjects
Details
- Language :
- English
- ISSN :
- 17560381
- Volume :
- 14
- Issue :
- 1
- Database :
- OpenAIRE
- Journal :
- BioData Mining
- Accession number :
- edsair.doi.dedup.....b65ad70aba966fdd03c84f586d9a26e9