1. MetaGeneBank: a standardized database to study deep sequenced metagenomic data from human fecal specimen
- Author
-
Jingyang Qian, Xiaohui Fan, Wenbin Chen, Jie Liao, and Li Shao
- Subjects
Microbiology (medical) ,Test data generation ,Big data ,Biology ,computer.software_genre ,Microbiology ,Data type ,Deep sequenced metagenomes ,Database ,Feces ,Databases, Genetic ,Humans ,Microbiome ,Interpretability ,Gut microbiome ,business.industry ,High-Throughput Nucleotide Sequencing ,Human disease ,Benchmarking ,QR1-502 ,Gastrointestinal Microbiome ,Metagenomics ,Sample collection ,business ,computer - Abstract
Background Microbiome big data from population-scale cohorts holds the key to unleash the power of microbiomes to overcome critical challenges in disease control, treatment and precision medicine. However, variations introduced during data generation and processing limit the comparisons among independent studies in respect of interpretability. Although multiple databases have been constructed as platforms for data reuse, they are of limited value since only raw sequencing files are considered. Description Here, we present MetaGeneBank, a standardized database that provides details on sample collection and sequencing, and abundances of genes, microbiota and molecular functions for 4470 raw sequencing files (over 12 TB) collected from 16 studies covering over 10 types of diseases and 14 countries using a unified data-processing pipeline. The incorporation of tools that enable browsing and searching with descriptive attributes, gene sequences, microbiota and functions makes the database user-friendly. We found that the source of specimen contributes more than sequencing centers or platforms to the variations of microbiota. Special attention should be paid when re-analyzing sequencing files from different countries. Conclusions Collectively, MetaGeneBank provides a gateway to utilize the untapped potential of gut metagenomic data in helping fighting against human diseases. With the continuous updating of the database in terms of data volume, data types and sample types, MetaGeneBank would undoubtedly be the benchmarking database in the future in respect of data reuse, and would be valuable in translational science.
- Published
- 2021
- Full Text
- View/download PDF