Back to Search Start Over

GenoVault: a cloud based genomics repository

Authors :
Rajendra Joshi
Neeraj Bharti
Sunitha Manjari Kasibhatla
Uddhavesh Sonavane
Suprit Hesarur
Amit Saxena
Kirti Bhadhadhara
Sankalp Jain
Source :
BioData Mining, Vol 14, Iss 1, Pp 1-10 (2021), BioData Mining
Publication Year :
2021
Publisher :
BMC, 2021.

Abstract

GenoVault is a cloud-based repository for handling Next Generation Sequencing (NGS) data. It is developed using OpenStack based private cloud with various services like keystone for authentication, cinder for block storage, neutron for networking and nova for managing compute instances for the Cloud. GenoVault uses object-based storage, which enables data to be stored as objects instead of files or blocks for faster retrieval from different distributed object nodes. Along with a web-based interface JavaFX-based desktop client has also been developed to meet the requirements of large file uploads ( > 5 GB) that is usually seen in NGS datasets. Users can store as many as one million files in their respective object based storage areas and the metadata provided by the user during file uploads is used for querying the database. GenoVault repository is designed taking into account future needs and hence can scale both vertically and horizontally without any need for modification in the design. Users have an option to make the data shareable to the public or restrict the access as private. Data security is ensured as every container is a separate entity in object-based storage architecture also supported by secured file transfer protocol during data upload and download. The data is uploaded by the user in their individual containers that include raw read files (fastq), processed alignment files (bam, sam, bed) and output of variation detection (vcf). GenoVault architecture allows verification of the data in terms of integrity and authentication before making it available to collaborators as per user permissions. GenoVault is useful for maintaining the organization wide NGS data generated by experiments in various labs which is not yet published and submitted to public repositories like NCBI. GenoVault also provides support to share NGS data among the collaborating institutions. GenoVault can thus manage vast volumes of NGS data on any OpenStack-based private cloud.

Details

Language :
English
ISSN :
17560381
Volume :
14
Issue :
1
Database :
OpenAIRE
Journal :
BioData Mining
Accession number :
edsair.doi.dedup.....b65ad70aba966fdd03c84f586d9a26e9