Back to Search
Start Over
A Graph-Centric Approach for Metagenome-Guided Peptide and Protein Identification in Metaproteomics
- Source :
- PLoS Computational Biology, PLoS Computational Biology, Vol 12, Iss 12, p e1005224 (2016)
- Publication Year :
- 2016
- Publisher :
- Public Library of Science, 2016.
-
Abstract
- Metaproteomic studies adopt the common bottom-up proteomics approach to investigate the protein composition and the dynamics of protein expression in microbial communities. When matched metagenomic and/or metatranscriptomic data of the microbial communities are available, metaproteomic data analyses often employ a metagenome-guided approach, in which complete or fragmental protein-coding genes are first directly predicted from metagenomic (and/or metatranscriptomic) sequences or from their assemblies, and the resulting protein sequences are then used as the reference database for peptide/protein identification from MS/MS spectra. This approach is often limited because protein coding genes predicted from metagenomes are incomplete and fragmental. In this paper, we present a graph-centric approach to improving metagenome-guided peptide and protein identification in metaproteomics. Our method exploits the de Bruijn graph structure reported by metagenome assembly algorithms to generate a comprehensive database of protein sequences encoded in the community. We tested our method using several public metaproteomic datasets with matched metagenomic and metatranscriptomic sequencing data acquired from complex microbial communities in a biological wastewater treatment plant. The results showed that many more peptides and proteins can be identified when assembly graphs were utilized, improving the characterization of the proteins expressed in the microbial communities. The additional proteins we identified contribute to the characterization of important pathways such as those involved in degradation of chemical hazards. Our tools are released as open-source software on github at https://github.com/COL-IU/Graph2Pro.<br />Author Summary In recent years, meta-omic (including metatranscriptomic and metaproteomic) techniques have been adopted as complementary approaches to metagenomic sequencing to study functional characteristics and dynamics of microbial communities, aiming at a holistic understanding of a community to respond to the changes in the environment. Currently, metaproteomic data are largely analyzed using the bioinformatics tools originally designed in bottom-up proteomics. In particular, recent metaproteomic studies employed a metagenome-guided approach, in which complete or fragmental protein-coding genes were first predicted from metagenomic sequences (i.e., contigs or scaffolds), acquired from the matched community samples, and predicted protein sequences were then used in peptide identification. A key challenge of this approach is that the protein coding genes predicted from assembled metagenomic contigs can be incomplete and fragmented due to the complexity of metagenomic samples and the short reads length in metagenomic sequencing. To address this issue, in this paper, we present a graph-centric approach that exploits the de bruijn graph structure reported by metagenome assembly algorithms to improve metagenome-guided peptide and protein identification in metaproteomics. We show that our method can identify much more peptides and proteins, improving the characterization of the proteins expressed in the microbial communities.
- Subjects :
- 0301 basic medicine
Proteomics
Peptide
Plant Science
Biochemistry
De Bruijn graph
Database and Informatics Methods
Tandem Mass Spectrometry
Database Searching
Photosynthesis
lcsh:QH301-705.5
chemistry.chemical_classification
Ecology
Plant Biochemistry
Microbiota
Genomics
6. Clean water
Computational Theory and Mathematics
Modeling and Simulation
symbols
Sequence Analysis
Algorithms
Research Article
Gene prediction
Sequence Databases
Computational biology
Biology
Research and Analysis Methods
03 medical and health sciences
Cellular and Molecular Neuroscience
symbols.namesake
Genetics
Ribulose-1,5-Bisphosphate Carboxylase Oxygenase
Humans
Molecular Biology Techniques
Sequencing Techniques
Sequence Similarity Searching
Gene Prediction
Gene
Molecular Biology
Ecology, Evolution, Behavior and Systematics
Sequence Assembly Tools
Biology and Life Sciences
Computational Biology
Proteins
Genome Analysis
030104 developmental biology
Biological Databases
lcsh:Biology (General)
chemistry
Metagenomics
Metaproteomics
Protein identification
Peptides
Subjects
Details
- Language :
- English
- ISSN :
- 15537358 and 1553734X
- Volume :
- 12
- Issue :
- 12
- Database :
- OpenAIRE
- Journal :
- PLoS Computational Biology
- Accession number :
- edsair.doi.dedup.....fea395cdc2f06e811688d27365f910e3