84 results on '"NeXtProt"'
Search Results
2. Progress Identifying and Analyzing the Human Proteome: 2021 Metrics from the HUPO Human Proteome Project
- Author
-
Fernando J. Corrales, Lydie Lane, Eric W. Deutsch, Sudhir Srivastava, Yu-Ju Chen, Ruedi Aebersold, Nuno Bandeira, Young Ki Paik, Susan T. Weintraub, Siqi Liu, Ileana M. Cristea, Cecilia Lindskog, Michael H.A. Roehrl, Robert L. Moritz, Gilbert S. Omenn, and Christopher M. Overall
- Subjects
Proteomics ,0303 health sciences ,NeXtProt ,Proteome ,030302 biochemistry & molecular biology ,Human Protein Atlas ,Genomics ,General Chemistry ,Computational biology ,Biology ,Biochemistry ,Mass Spectrometry ,Article ,3. Good health ,Glycoproteomics ,03 medical and health sciences ,Benchmarking ,Human proteome project ,Humans ,PeptideAtlas ,Databases, Protein ,030304 developmental biology - Abstract
The 2021 Metrics of the HUPO Human Proteome Project (HPP) show that protein expression has now been credibly detected (neXtProt PE1 level) for 18 357 (92.8%) of the 19 778 predicted proteins coded in the human genome, a gain of 483 since 2020 from reports throughout the world reanalyzed by the HPP. Conversely, the number of neXtProt PE2, PE3, and PE4 missing proteins has been reduced by 478 to 1421. This represents remarkable progress on the proteome parts list. The utilization of proteomics in a broad array of biological and clinical studies likewise continues to expand with many important findings and effective integration with other omics platforms. We present highlights from the Immunopeptidomics, Glycoproteomics, Infectious Disease, Cardiovascular, Musculo-Skeletal, Liver, and Cancers B/D-HPP teams and from the Knowledgebase, Mass Spectrometry, Antibody Profiling, and Pathology resource pillars, as well as ethical considerations important to the clinical utilization of proteomics and protein biomarkers.
- Published
- 2021
3. Theoretical considerations for next-generation proteomics
- Author
-
Magnus Palmblad
- Subjects
Proteomics ,0301 basic medicine ,next-generation proteomics ,Proteome ,Computer science ,Peptide ,Computational biology ,Biochemistry ,fluorosequencing ,Mass Spectrometry ,peptide−partial read match ,03 medical and health sciences ,protein identification ,NeXtProt ,Technical Note ,enzymatic digestion ,peptide-partial read match ,Humans ,Amino Acid Sequence ,theory ,chemistry.chemical_classification ,030102 biochemistry & molecular biology ,Protein molecules ,General Chemistry ,simulation ,Amino acid ,030104 developmental biology ,chemistry ,single-molecule sequencing ,Peptide sequencing ,Peptides ,Function (biology) - Abstract
While mass spectrometry still dominates proteomics research, alternative and potentially disruptive, next-generation technologies are receiving increased investment and attention. Most of these technologies aim at the sequencing of single peptide or protein molecules, typically labeling or otherwise distinguishing a subset of the proteinogenic amino acids. This note considers some theoretical aspects of these future technologies from a bottom-up proteomics viewpoint, including the ability to uniquely identify human proteins as a function of which and how many amino acids can be read, enzymatic efficiency, and the maximum read length. This is done through simulations under ideal and non-ideal conditions to set benchmarks for what may be achievable with future single-molecule sequencing technology. The simulations reveal, among other observations, that the best choice of reading N amino acids performs similarly to the average choice of N+1 amino acids, and that the discrimination power of the amino acids scales with their frequency in the proteome. The simulations are agnostic with respect to the next-generation proteomics platform, and the results and conclusions should therefore be applicable to any single-molecule partial peptide sequencing technology.
- Published
- 2021
4. iHPDM: In Silico Human Proteome Digestion Map with Proteolytic Peptide Analysis and Graphical Visualizations
- Author
-
Ting-Yi Sung, Jen-Hung Wang, Wai-Kok Choong, and Ching-Tai Chen
- Subjects
0301 basic medicine ,Proteases ,Protease ,030102 biochemistry & molecular biology ,NeXtProt ,Chemistry ,medicine.medical_treatment ,General Chemistry ,Computational biology ,Proteomics ,Trypsin ,Biochemistry ,03 medical and health sciences ,030104 developmental biology ,Protein sequencing ,Human proteome project ,medicine ,Shotgun proteomics ,medicine.drug - Abstract
When conducting proteomics experiments to detect missing proteins and protein isoforms in the human proteome, it is desirable to use a protease that can yield more unique peptides with properties amenable for mass spectrometry analysis. Though trypsin is currently the most widely used protease, some proteins can yield only a limited number of unique peptides by trypsin digestion. Other proteases and multiple proteases have been applied in reported studies to increase the number of identified proteins and protein sequence coverage. To facilitate the selection of proteases, we developed a web-based resource, called in silico Human Proteome Digestion Map (iHPDM), which contains a comprehensive proteolytic peptide database constructed from human proteins, including isoforms, in neXtProt digested by 15 protease combinations of one or two proteases. iHPDM provides convenient functions and graphical visualizations for users to examine and compare the digestion results of different proteases. Notably, it also supports users to input filtering criteria on digested peptides, e.g., peptide length and uniqueness, to select suitable proteases. iHPDM can facilitate protease selection for shotgun proteomics experiments to identify missing proteins, protein isoforms, and single amino acid variant peptides.
- Published
- 2019
5. Reflections on the HUPO Human Proteome Project, the Flagship Project of the Human Proteome Organization, at 10 Years
- Author
-
Gilbert S. Omenn and Kurt Wuthrich
- Subjects
Societies, Scientific ,Proteome ,HUPO, Human Proteome Organization ,Mass Spectrometry Data Interpretation Guidelines ,Library science ,Biochemistry ,History, 21st Century ,Analytical Chemistry ,PRIDE, Proteomics Identification Database ,03 medical and health sciences ,functionally unannotated proteins ,Political science ,Human proteome project ,Milestone (project management) ,neXtProt ,Humans ,SRM, selected reaction monitoring ,Molecular Biology ,Human Proteome Project ,030304 developmental biology ,0303 health sciences ,NeXtProt ,Information Dissemination ,TPP, Trans-Proteomic Pipeline ,030302 biochemistry & molecular biology ,MP, missing proteins according to neXtProt ,Data Accuracy ,PTM, posttranslationally modified ,MS, mass spectrometry ,Perspective ,missing proteins ,blueprint ,HPP, Human Proteome Project - Abstract
We celebrate the 10th anniversary of the launch of the HUPO Human Proteome Project (HPP) and its major milestone of confident detection of at least one protein from each of 90% of the predicted protein-coding genes, based on the output of the entire proteomics community. The Human Genome Project reached a similar decadal milestone 20 years ago. The HPP has engaged proteomics teams around the world, strongly influenced data-sharing, enhanced quality assurance, and issued stringent guidelines for claims of detecting previously “missing proteins.” This invited perspective complements papers on “A High-Stringency Blueprint of the Human Proteome” and “The Human Proteome Reaches a Major Milestone” in special issues of Nature Communications and Journal of Proteome Research, respectively, released in conjunction with the October 2020 virtual HUPO Congress and its celebration of the 10th anniversary of the HUPO HPP., Graphical Abstract, Highlights • The global Human Proteome Project is the flagship activity of the HUPO. • HPP Guidelines for MS Data have greatly enhanced confidence in proteomics data. • The community has identified proteins from 90% of predicted protein-coding genes. • A total of 1899 predicted proteins lack sufficient evidence of expression as of 2020., In Brief Starting from several organ-oriented projects, HUPO in 2010 launched the Human Proteome Project to identify and characterize the protein parts list and integrate proteomics into multiomics research. Key steps were partnerships with neXtProt, PRIDE, PeptideAtlas, Human Protein Atlas, and instrument makers; global engagement of researchers; creation of ProteomeXchange; adoption of HPP Guidelines for Interpretation of MS Data and SRMAtlas for proteotypic peptides; annual metrics of finding “missing proteins” and functionally annotating proteins; and initiatives for early career scientists.
- Published
- 2021
6. Open-pFind Verified Four Missing Proteins from Multi-Tissues
- Author
-
Yuping Xie, Lei Chang, Jinshuai Sun, Bowen Zhong, Shujia Wu, Xi Wang, Feng Xu, Zhonghua Yan, Ping Xu, Yanchang Li, Junzhu Wu, Fuchu He, Hao Chi, Dongxue Wang, and Yao Zhang
- Subjects
chemistry.chemical_classification ,Male ,Proteomics ,NeXtProt ,Proteome ,Peptide ,General Chemistry ,Computational biology ,Biology ,Trypsin ,Biochemistry ,Transmembrane protein ,Mass Spectrometry ,Molecular Weight ,chemistry.chemical_compound ,chemistry ,Human proteome project ,Peptide synthesis ,medicine ,Humans ,Female ,PeptideAtlas ,Peptides ,medicine.drug - Abstract
The Chromosome-Centric Human Proteome Project (C-HPP) was launched in 2012 to perfect the annotation of human protein existence by identifying stronger evidence of the expression of missing proteins (MPs) at the protein level. After an 8 year effort all over the world, the number of MPs in the neXtProt database significantly decreased from 5511 (2012-02-24) to 1899 (2020-01-17). It is now more difficult to provide confident evidence of the remaining MPs because of their specific characteristics, including low abundance, low molecular weight, unexpected modifications, transmembrane structure, tissue-expression specificity, and so on. A higher resolution mass spectrometry (MS) interpretation engine might provide an opportunity to identify these buried MPs in complex samples by the combination with multi-tissue large-scale proteomics. In this study, open-pFind was used to dig MPs from 20 pairs of healthy human tissues by Wang et al. ( Mol. Syst. Biol. 2019, 15 (2), e8503) combined with our large-scale testis data set digested by three enzymes (Glu-C, Lys-C, and trypsin) with specificity for different amino acid residues ( J. Proteme Res. 2019, 18 (12), 4189-4196). A total of 1 535 536 peptides with 17 283 477 peptide-spectrum matches (PSMs) were mapped to 14 279 protein entries at a false discovery rate of
- Published
- 2020
7. Bioinformatic Prediction of Gene Ontology Terms of Uncharacterized Proteins from Chromosome 11
- Author
-
Yeji Yang, Jin Young Kim, Heeyoun Hwang, Yun-Hee Kim, Hye Jin Kim, Ji Eun Im, Kyung-Hoon Kwon, and Jong Shin Yoo
- Subjects
NeXtProt ,Gene ontology ,Chromosomes, Human, Pair 11 ,HEK 293 cells ,CCDC90B ,Computational Biology ,Proteins ,General Chemistry ,Computational biology ,Biology ,Biochemistry ,Annotation ,Gene Ontology ,Chromosome (genetic algorithm) ,Humans ,Databases, Protein ,Function (biology) ,Web site - Abstract
In chromosome 11, 71 out of its 1254 proteins remain functionally uncharacterized on the basis of their existence evidence (uPE1s) following the latest version of neXtProt (release 2020-01-17). Because in vivo and in vitro experimental strategies are often time-consuming and labor-intensive, there is a need for a bioinformatics tool to predict the function annotation. Here, we used I-TASSER/COFACTOR provided on the neXtProt web site, which predicts gene ontology (GO) terms based on the 3D structure of the protein. I-TASSER/COFACTOR predicted 2413 GO terms with a benchmark dataset of the 22 proteins belonging to PE1 of chromosome 11. In this study, we developed a filtering algorithm in order to select specific GO terms using the GO map generated by I-TASSER/COFACTOR. As a result, 187 specific GO terms showed a higher average precision-recall score at the least cellular component term compared to 2413 predicted GO terms. Next, we applied 65 proteins belonging to uPE1s of chromosome 11, and then 409 out of 6684 GO terms survived, where 103 and 142 GO terms of molecular function and biological process, respectively, were included. Representatively, the cellular component GO terms of CCDC90B, C11orf52, and the SMAP were predicted and validated using the overexpression system into 293T cells and immunofluorescence staining. We will further study their biological and molecular functions toward the goal of the neXt-CP50 project as a part of C-HPP. We shared all results and programs in Github (https://github.com/heeyounh/I-TASSER-COFACTOR-filtering.git).
- Published
- 2020
8. Comparative Proteomic Profiling of 3T3-L1 Adipocyte Differentiation Using SILAC Quantification
- Author
-
Neha Goswami, Frank Schmidt, and Sunkyu Choi
- Subjects
0301 basic medicine ,Proteomics ,Adipogenesis ,030102 biochemistry & molecular biology ,NeXtProt ,Proteomic Profiling ,Chemistry ,Cell Differentiation ,General Chemistry ,Biochemistry ,Cell biology ,03 medical and health sciences ,Mice ,030104 developmental biology ,Stable isotope labeling by amino acids in cell culture ,3T3-L1 Cells ,Proteome ,Human proteome project ,Adipocytes ,Animals ,Humans ,Cristae formation - Abstract
Adipocyte differentiation is a general physiological process that is also critical for metabolic syndrome. In spite of extensive study in the past two decades, adipogenesis is a still complex cellular process that is accompanied by complicated molecular mechanisms. Here, we performed SILAC-based quantitative global proteomic profiling of 3T3-L1 adipocyte differentiation. We report protein changes to the proteome profiles, with 354 proteins exhibiting significant increase and 56 proteins showing decrease in our statistical analysis. Our results show that adipocyte differentiation is involved not only in metabolic processes by increasing TCA cycle, fatty acid synthesis, lipolysis, acetyl-CoA production, antioxidants, and electron transport, but also in nicotinamide metabolism, cristae formation, mitochondrial protein import, and Ca2+ transport into mitochondria and ER. A search for Chromosome-Centric Human Proteome Project (C-HPP) using neXtprot highlighted one protein with a protein existence uncertain (PE5) and 17 proteins as functionally uncharacterized protein existence 1 (uPE1). This study provides quantitative information on proteome changes in adipogenic differentiation, which is helpful in improving our understanding of the processes of adipogenesis.
- Published
- 2020
9. Research on the Human Proteome Reaches a Major Milestone:90% of Predicted Human Proteins Now Credibly Detected, According to the HUPO Human Proteome Project
- Author
-
Siqi Liu, Nuno Bandeira, Cecilia Lindskog, Stephen R. Pennington, Eric W. Deutsch, Robert L. Moritz, Lydie Lane, Ileana M. Cristea, Jennifer E. Van Eyk, Fernando J. Corrales, Michael Snyder, Young Ki Paik, Mark S. Baker, Christopher M. Overall, Gilbert S. Omenn, Ruedi Aebersold, National Institutes of Health (US), National Science Foundation (US), Swiss Institute of Bioinformatics, Canadian Institutes of Health Research, Canada Research Chairs, Knut and Alice Wallenberg Foundation, and Ministry of Health and Welfare (South Korea)
- Subjects
0301 basic medicine ,Proteomics ,PeptideAtlas ,Proteome ,Human Protein Atlas ,Genomics ,Computational biology ,Biology ,Biochemistry ,Mass Spectrometry ,Article ,03 medical and health sciences ,Human Proteome Project (HPP) ,Human proteome project ,Humans ,Missing proteins (MPs) ,Databases, Protein ,Gene ,Uncharacterized protein existence 1 (uPE1) ,Mass Spectrometry Interactive Virtual Environment (MassIVE) ,ddc:616 ,030102 biochemistry & molecular biology ,NeXtProt ,Genome, Human ,Chromosome-centric HPP (C-HPP) ,General Chemistry ,3. Good health ,non-MS PE1 proteins ,030104 developmental biology ,Human genome ,Biology and Disease-HPP (B/D-HPP) ,neXtProt protein existence (PE) metrics - Abstract
According to the 2020 Metrics of the HUPO Human Proteome Project (HPP), expression has now been detected at the protein level for >90% of the 19 773 predicted proteins coded in the human genome. The HPP annually reports on progress made throughout the world toward credibly identifying and characterizing the complete human protein parts list and promoting proteomics as an integral part of multiomics studies in medicine and the life sciences. NeXtProt release 2020–01 classified 17 874 proteins as PE1, having strong protein-level evidence, up 180 from 17 694 one year earlier. These represent 90.4% of the 19 773 predicted coding genes (all PE1,2,3,4 proteins in neXtProt). Conversely, the number of neXtProt PE2,3,4 proteins, termed the “missing proteins” (MPs), was reduced by 230 from 2129 to 1899 since the neXtProt 2019–01 release. PeptideAtlas is the primary source of uniform reanalysis of raw mass spectrometry data for neXtProt, supplemented this year with extensive data from MassIVE. PeptideAtlas 2020–01 added 362 canonical proteins between 2019 and 2020 and MassIVE contributed 84 more, many of which converted PE1 entries based on non-MS evidence to the MS-based subgroup. The 19 Biology and Disease-driven B/D-HPP teams continue to pursue the identification of driver proteins that underlie disease states, the characterization of regulatory mechanisms controlling the functions of these proteins, their proteoforms, and their interactions, and the progression of transitions from correlation to coexpression to causal networks after system perturbations. And the Human Protein Atlas published Blood, Brain, and Metabolic Atlases., G.S.O. acknowledges support from National Institutes of Health grants P30ES017885-01A1 and U24CA210967; E.W.D. from National Institutes of Health grants R01GM087221, R24GM127667, U19AG023122, and from National Science Foundation grant DBI-1933311; L.L. and neXtProt from the SIB Swiss Institute of Bioinformatics; C.M.O. by Canadian Institutes of Health Research Foundation Grant 148408 and a Canada Research Chair in Protease Proteomics and Systems Biology; M.S.B. by NHMRC Project GrantAPP1010303; C.L. by the Knut and Alice Wallenberg Foundation for the Human Protein Atlas; and Y.-K.P. by grants from the Korean Ministry of Health and Welfare HI13C22098 and HI16C0257.
- Published
- 2020
10. Is It Possible to Find Needles in a Haystack? Meta-Analysis of 1000+ MS/MS Files Provided by the Russian Proteomic Consortium for Mining Missing Proteins
- Author
-
Olga I. Kiseleva, Alexander I. Archakov, Svetlana Novikova, N. E. Kushlinskii, Yuri D. Ivanov, Ekaterina V. Ilgisonis, Mikhail V. Gorshkov, Arthur T. Kopylov, Ekaterina V. Poverennaya, Elena A. Ponomarenko, and Alexei Kononikhin
- Subjects
NeXtProt ,Clinical Biochemistry ,lcsh:QR1-502 ,Computational biology ,Biology ,Proteomics ,ENCODE ,proteotypic peptide ,Biochemistry ,Article ,lcsh:Microbiology ,uncertain proteins ,Structural Biology ,Meta-analysis ,Human proteome project ,neXtProt ,Identification (biology) ,Human genome ,Chromosome-Centric Human Proteome Project (C-HPP) ,Haystack ,missing proteins ,Molecular Biology ,human proteome ,mass spectrometry - Abstract
Despite direct or indirect efforts of the proteomic community, the fraction of blind spots on the protein map is still significant. Almost 11% of human genes encode missing proteins, the existence of which proteins is still in doubt. Apparently, proteomics has reached a stage when more attention and curiosity need to be exerted in the identification of every novel protein in order to expand the unusual types of biomaterials and/or conditions. It seems that we have exhausted the current conventional approaches to the discovery of missing proteins and may need to investigate alternatives. Here, we present an approach to deciphering missing proteins based on the use of non-standard methodological solutions and encompassing diverse MS/MS data, obtained for rare types of biological samples by members of the Russian Proteomic community in the last five years. These data were re-analyzed in a uniform manner by three search engines, which are part of the SearchGUI package. The study resulted in the identification of two missing and five uncertain proteins detected with two peptides. Moreover, 149 proteins were detected with a single proteotypic peptide. Finally, we analyzed the gene expression levels to suggest feasible targets for further validation of missing and uncertain protein observations, which will fully meet the requirements of the international consortium. The MS data are available on the ProteomeXchange platform (PXD014300).
- Published
- 2020
- Full Text
- View/download PDF
11. Extending Comet for global amino acid variant and post-translational modification analysis using the PSI extended FASTA format (PEFF)
- Author
-
Jimmy K. Eng and Eric W. Deutsch
- Subjects
Proteomics ,0303 health sciences ,NeXtProt ,Sequence database ,Proteomics Standards Initiative ,Proteome ,Computer science ,030302 biochemistry & molecular biology ,FASTA format ,Computational biology ,Biochemistry ,Post Translational Modification Analysis ,Article ,03 medical and health sciences ,HEK293 Cells ,Human proteome project ,Comet (programming) ,Humans ,Amino Acids ,Databases, Protein ,Molecular Biology ,Protein Processing, Post-Translational ,Software ,030304 developmental biology - Abstract
Protein identification by tandem mass spectrometry sequence database searching is a standard practice in many proteomics laboratories. The de facto standard for the representation of sequence databases used as input to sequence database search tools is the FASTA format. The Human Proteome Organization's Proteomics Standards Initiative has developed an extension to the FASTA format termed the proteomics standards initiative extended FASTA format or PSI extended FASTA format (PEFF) where additional information such as structural annotations are encoded in the protein description lines. Comet has been extended to automatically analyze the post translational modifications and amino acid substitutions encoded in PEFF databases. Comet's PEFF implementation and example analysis results searching a HEK293 dataset against the neXtProt PEFF database are presented.
- Published
- 2020
12. Multiproteases Combined with High-pH Reverse-Phase Separation Strategy Verified Fourteen Missing Proteins in Human Testis Tissue
- Author
-
Zhitang Lv, Yao Zhang, Yue Zhou, Fuchu He, Fengsong Liu, Yihao Wang, Jinshuai Sun, Yanchang Li, Yang Chen, Ping Xu, Degang Kong, Lei Chang, and Jiahui Shi
- Subjects
Male ,Proteomics ,0301 basic medicine ,Proteome ,Tissue sample ,Peptide ,Computational biology ,01 natural sciences ,Biochemistry ,Mass Spectrometry ,Liver carcinoma ,03 medical and health sciences ,Testis ,Human proteome project ,Humans ,Single amino acid ,chemistry.chemical_classification ,NeXtProt ,Chemistry ,Liver Neoplasms ,010401 analytical chemistry ,Genetic Variation ,General Chemistry ,0104 chemical sciences ,030104 developmental biology ,Human testis ,Electrophoresis, Polyacrylamide Gel ,Protein Processing, Post-Translational ,Peptide Hydrolases - Abstract
Subsequent to conducting the Chromosome-Centric Human Proteome Project, we have focused on human testis-enriched missing proteins (MPs) since 2015. For protein coverage to be enhanced, a multiprotease strategy was used for separation of samples by 10% SDS-PAGE. For the separating efficiency to be improved, a high-pH reverse phase (RP) separation strategy was applied to fractionate complex samples in this study. A total of 11,558 proteins was identified, which is the largest proteome data set for single human tissue sample so far. On the basis of this large-scale data set, we verified 14 MPs (PE2) in neXtProt (2018-01) after spectrum quality analysis, isobaric post-translational modification, and single amino acid variant filtering, and synthesized peptide matching. Tissue expression analysis showed that 3 of 14 MPs were testis-specific proteins. Functional analysis showed that 10 of 14 MPs were closely related to liver tumor, liver carcinoma, and hepatocellular carcinoma. Another 100 MPs were listed as candidates but required additional verification information. All MS data sets have been deposited into the ProteomeXchange with the identifier PXD009737.
- Published
- 2018
13. Progress on Identifying and Characterizing the Human Proteome: 2018 Metrics from the HUPO Human Proteome Project
- Author
-
Lydie Lane, Fernando J. Corrales, Siqi Liu, Gilbert S. Omenn, Eric W. Deutsch, Mark S. Baker, Young Ki Paik, Jennifer E. Van Eyk, Christopher M. Overall, Michael Snyder, and Jochen M. Schwenk
- Subjects
Proteomics ,0301 basic medicine ,Proteome ,SUMO protein ,Human Protein Atlas ,Guidelines as Topic ,Computational biology ,Biology ,Biochemistry ,Mass Spectrometry ,Article ,03 medical and health sciences ,Human proteome project ,Humans ,Protein Interaction Maps ,Databases, Protein ,ddc:616 ,030102 biochemistry & molecular biology ,NeXtProt ,General Chemistry ,International working group ,3. Good health ,030104 developmental biology ,Research Design ,PeptideAtlas ,Software - Abstract
The Human Proteome Project (HPP) annually reports on progress throughout the field in credibly identifying and characterizing the human protein parts list and making proteomics an integral part of multi-omics studies in medicine and the life sciences. neXtProt release 2018–01-17, the baseline for this 6(th) annual HPP special issue of the Journal of Proteome Research, contains 17,470 PE1 proteins, 89% of all neXtProt predicted PE1–4 proteins, up from 17,008 in release 2017–01-23 and 13,975 in release 2012–02-24. Conversely, the number of neXtProt PE2,3,4 missing proteins has been reduced from 2949 to 2579 to 2186 over the past two years. Of the PE1 proteins, 16,092 are based on mass spectrometry results, and 1378 on other kinds of protein studies, notably protein-protein interaction findings. PeptideAtlas has 15,798 canonical proteins, up 625 over the past year, including 269 from SUMOylation studies. The largest reason for missing proteins is low abundance. Meanwhile, the Human Protein Atlas has released its Cell Atlas, Pathology Atlas, and updated Tissue Atlas, and is applying recommendations from the International Working Group on Antibody Validation. Finally, there is progress using the quantitative multiplex organ-specific popular proteins targeted proteomics approach in various disease categories.
- Published
- 2018
14. Dissolving capability difference based sequential extraction: A versatile tool for in-depth membrane proteome analysis
- Author
-
Lihua Zhang, Qun Zhao, Yukui Zhang, Zhen Liang, Fei Fang, and Xiao Li
- Subjects
0301 basic medicine ,Spectrometry, Mass, Electrospray Ionization ,Cell signaling ,Proteome ,Biochemistry ,Analytical Chemistry ,HeLa ,03 medical and health sciences ,Tandem Mass Spectrometry ,Humans ,Environmental Chemistry ,Databases, Protein ,Cell adhesion ,Gene ,Spectroscopy ,Chromatography, Reverse-Phase ,Chromatography ,NeXtProt ,biology ,Chemistry ,Membrane Proteins ,biology.organism_classification ,Transmembrane domain ,030104 developmental biology ,Solubility ,Membrane protein ,HeLa Cells - Abstract
Profiling membrane proteins would facilitate revealing disease mechanism and discovering new drug targets as they play essential roles in cellular signaling, substrate transport, and cell adhesion. However, the analysis of membrane proteins still remains a challenge due to their high hydrophobicity, as well as the suppression effect of high abundant soluble proteins. In this work, to achieve a membrane proteome profiling, a sample preparation strategy based on sequential extraction at the protein level assisted by a range of extraction reagents with different dissolving capabilities, followed by nano-RPLC-ESI-MS/MS analysis was developed and applied for HeLa cell line analysis. It was found that with progressively harsher extraction reagents (i.e., 2 M NaCl, 4 M urea, 0.1 M Na2CO3, and 10% 1-dodecyl-3- methyl-imidazolium chloride (C12ImCl) performed, much more high hydrophobic proteins and low abundant proteins were identified. With our developed strategy, 5553 of the identified proteins (4419 gene products) were annotated to be membrane proteins and 2573 proteins (2183 gene products) have at least one transmembrane domain, to our best knowledge, which is the most comprehensive membrane proteome dataset for HeLa cell line. Notably, 110 of the identified membrane proteins were discovered in the “missing proteins” list referred to those in the neXtProt database. All above results indicated that our strategy has great potential to tackle the difficult but relevant task of identifying and profiling membrane proteins.
- Published
- 2016
15. Open-pFind Enhances the Identification of Missing Proteins from Human Testis Tissue
- Author
-
Junzhu Wu, Yao Zhang, Yihao Wang, Jiahui Shi, Ping Xu, Zhitang Lyu, Liping Zhao, Wen-Jun Li, Fuchu He, Fengsong Liu, Yanchang Li, Hong Wang, Jinshuai Sun, Lei Chang, and Shujia Wu
- Subjects
0301 basic medicine ,Male ,Proteomics ,Peptide ,Computational biology ,Biochemistry ,Mass Spectrometry ,03 medical and health sciences ,Search engine ,Testis ,Human proteome project ,medicine ,Humans ,Shotgun proteomics ,Databases, Protein ,chemistry.chemical_classification ,030102 biochemistry & molecular biology ,NeXtProt ,Chemistry ,Proteins ,General Chemistry ,Trypsin ,Amino acid ,Search Engine ,030104 developmental biology ,Proteome ,Protein Processing, Post-Translational ,Software ,medicine.drug - Abstract
In recent years, high-throughput technologies have contributed to the development of a more precise picture of the human proteome. However, 2129 proteins remain listed as missing proteins (MPs) in the newest neXtProt release (2019-02). The main reasons for MPs are a low abundance, a low molecular weight, unexpected modifications, membrane characteristics, and so on. Moreover, >50% of the MS/MS data have not been successfully identified in shotgun proteomics. Open-pFind, an efficient open search engine, recently released by the pFind group in China, might provide an opportunity to identify these buried MPs in complex samples. In this study, proteins and potential MPs were identified using Open-pFind and three other search engines to compare their performance and efficiency with three large-scale data sets digested by three enzymes (Glu-C, Lys-C, and trypsin) with specificity on different amino acid (AA) residues. Our results demonstrated that Open-pFind identified 44.7-93.1% more peptide-spectrum matches and 21.3-61.6% more peptide sequences than the second-best search engine. As a result, Open-pFind detected 53.1% more MP candidates than MaxQuant and 8.8% more candidate MPs than Proteome Discoverer. In total, 5 (PE2) of the 124 MP candidates identified by Open-pFind were verified with 2 or 3 unique peptides containing more than 9 AAs by using a spectrum theoretical prediction with pDeep and synthesized peptide matching with pBuild after spectrum quality analysis, isobaric post-translational modification, and single amino acid variant filtering. These five verified MPs can be saved as PE1 proteins. In addition, three other MP candidates were verified with two unique peptides (one peptide containing more than 9 AAs and the other containing only 8 AAs), which was slightly lower than the criteria listed by C-HPP and required additional verification information. More importantly, unexpected modifications were detected in these MPs. All MS data sets have been deposited into ProteomeXchange with the identifier PXD015759.
- Published
- 2019
16. Utilization of the Proteome Data Deposited in SRMAtlas for Validating the Existence of the Human Missing Proteins in GPM
- Author
-
Amr Elguoshy, Bo Xu, Naohiko Kinoshita, Tadashi Yamamoto, Toshiaki Mitsui, Yoshitoshi Hirao, and Keiko Yamamoto
- Subjects
0301 basic medicine ,Proteomics ,Proteome ,Computational biology ,Biology ,Biochemistry ,03 medical and health sciences ,Human proteome project ,Humans ,Protein Interaction Maps ,Databases, Protein ,chemistry.chemical_classification ,030102 biochemistry & molecular biology ,NeXtProt ,Protein level ,Reproducibility of Results ,Translation (biology) ,General Chemistry ,Amino acid ,030104 developmental biology ,chemistry ,Spectral matching ,Human genome ,Peptides ,Software - Abstract
The Human Proteome Project (HPP) has made great efforts to clarify the existing evidence of human proteins since 2012. However, according to the recent release of neXtProt (2019-1), approximately 10% of all human genes still have inadequate or no experimental evidence of their translation at the protein level. They were categorized as missing proteins (PE2-PE4). To further the goal of HPP, we developed a two-step bioinformatic strategy addressing the utilization of the SRMAtlas synthetic peptides corresponding to the missing proteins as an exclusive reference in order to explore their natural counterparts within GPM. In the first step, we searched the GPM for the non-nested SRMAtlas peptides corresponding to the missing proteins, taking under consideration only those detected via ≥2 non-nested unitypic/proteotypic peptides "Stranded peptides" with length ≥9 amino acids in the same proteomic study. As a result, 51 missing proteins were newly detected in 35 different proteomic studies. In the second step, we validated these newly detected missing proteins based on matching the spectra of their synthetic and natural peptides in SRMAtlas and GPM, respectively. The results showed that 23 of the missing proteins with ≥2 non-nested peptides were validated by careful spectral matching.
- Published
- 2019
17. Proteomic and N-Terminomic TAILS Analyses of Human Alveolar Bone Proteins: Improved Protein Extraction Methodology and LysargiNase Digestion Strategies Increase Proteome Coverage and Missing Protein Identification
- Author
-
Jayachandran N. Kizhakkedathu, Christopher M. Overall, Nestor Solis, Ian R Matthew, and Peter A. Bell
- Subjects
0301 basic medicine ,Proteomics ,Tissue Protein Extraction ,Adolescent ,Chemical Fractionation ,Biochemistry ,Peptide Mapping ,Connexins ,Mass Spectrometry ,03 medical and health sciences ,Young Adult ,Protein purification ,Human proteome project ,medicine ,Alveolar Process ,Humans ,Trypsin ,Databases, Protein ,Edetic Acid ,030102 biochemistry & molecular biology ,NeXtProt ,Chemistry ,Proteins ,General Chemistry ,Terminal amine isotopic labeling of substrates ,030104 developmental biology ,Durapatite ,Solubility ,Isotope Labeling ,Proteome ,Female ,medicine.drug - Abstract
With 2129 proteins still classified by the Human Proteome Organisation Human Proteome Project (HPP) as "missing" without compelling evidence of protein existence (PE) in humans, we hypothesized that in-depth proteomic characterization of tissues that are technically challenging to access and extract would yield evidence for tissue-specific missing proteins. Paradoxically, although the skeleton is the most massive tissue system in humans, as one of the poorest characterized by proteomics, bone falls under the HPP umbrella term as a "rare tissue". Therefore, we aimed to optimize mineralized tissue protein extraction methodology and workflows for proteomic and data analyses of small quantities of healthy young adult human alveolar bone. Osteoid was solubilized by GuHCl extraction, with hydroxyapatite-bound proteins then released by ethylenediaminetetraacetic acid demineralization. A subsequent GuHCl solubilization extraction was followed by solid-phase digestion of the remaining insoluble cross-linked protein using trypsin and then 6 M urea dissolution incorporating LysC digestion. Bone extracts were digested in parallel using trypsin, LysargiNase, AspN, or GluC prior to liquid chromatography-mass spectrometry analysis. Terminal Amine Isotopic Labeling of Substrates was used to purify semitryptic peptides, identifying natural and proteolytic-cleaved neo N-termini of bone proteins. Our strategy enabled complete solubilization of the organic bone matrix leading to extensive categorization of bone proteins in different bone matrix extracts, and hence matrix compartments, for the first time. Moreover, this led to the high confidence identification of pannexin-3, a "missing protein", found only in the insoluble collagenous matrix and revealed for the first time by trypsin solid-phase digestion. We also found a singleton proteotypic peptide of another missing protein, meiosis inhibitor protein 1. We also identified 17 proteins classified in neXtprot as PE1 based on evidence other than from MS, termed non-MS PE1 proteins, including ≥9-mer proteotypic peptides of four proteins.
- Published
- 2019
18. Blinded Testing of Function Annotation for uPE1 Proteins by I-TASSER/COFACTOR Pipeline Using the 2018-2019 Additions to neXtProt and the CAFA3 Challenge
- Author
-
Gilbert S. Omenn, Yang Zhang, Chengxin Zhang, and Lydie Lane
- Subjects
ddc:616 ,0301 basic medicine ,I tasser ,030102 biochemistry & molecular biology ,biology ,NeXtProt ,Gene ontology ,Computer science ,Computational Biology ,Proteins ,Molecular Sequence Annotation ,General Chemistry ,Computational biology ,Biochemistry ,Cofactor ,Article ,03 medical and health sciences ,Annotation ,030104 developmental biology ,Protein structure ,biology.protein ,Humans ,UniProt ,Critical Assessment of Function Annotation ,Databases, Protein - Abstract
In 2018, we reported a hybrid pipeline that predicts protein structures with I-TASSER and function with COFACTOR. I-TASSER/COFACTOR achieved Gene Ontology (GO) high prediction accuracies of Fmax = 0.69 and 0.57 for molecular function (MF) and biological process (BP), respectively, on 100 comprehensively annotated proteins. Now we report blinded analyses of newly annotated proteins in the critical assessment of function annotation (CAFA) three function prediction challenge and in neXtProt. For CAFA3 results released in May 2019, our predictions on 267 and 912 human proteins with newly annotated MF and BP terms achieved Fmax = 0.50 and 0.42, respectively, on "No Knowledge" proteins, and 0.51 and 0.74, respectively, on "Limited Knowledge" proteins. While COFACTOR consistently outperforms simple homology-based analysis, its accuracy still depends on template availability. Meanwhile, in neXtProt 2019-01, 25 proteins acquired new function annotation through literature curation at UniProt/Swiss-Prot. Before the release of these curated results, we submitted to neXtProt blinded predictions of free-text function annotation based on predicted GO terms. For 10 of the 25, a good match of free-text or GO term annotation was obtained. These blind tests represent rigorous assessments of I-TASSER/COFACTOR. neXtProt now provides links to precomputed I-TASSER/COFACTOR predictions for proteins without function annotation to facilitate experimental planning on "dark proteins".
- Published
- 2019
19. Worming into the Uncharacterized Human Proteome
- Author
-
Lydie Lane and Paula D. Duek
- Subjects
0301 basic medicine ,ved/biology.organism_classification_rank.species ,Gene Expression ,Computational biology ,Biology ,Biochemistry ,03 medical and health sciences ,Mice ,RNA interference ,Human proteome project ,Animals ,Humans ,Protein Interaction Maps ,Model organism ,Caenorhabditis elegans Proteins ,Databases, Protein ,Gene ,Caenorhabditis elegans ,ddc:616 ,030102 biochemistry & molecular biology ,NeXtProt ,Sequence Homology, Amino Acid ,ved/biology ,Membrane Proteins ,Nuclear Proteins ,Proteins ,General Chemistry ,biology.organism_classification ,Phenotype ,030104 developmental biology ,RNA Interference ,WormBase - Abstract
Using neXtProt release 2019-01-11, we manually curated a list of 1837 functionally uncharacterized human proteins. Using OrthoList 2, we found that 270 of them have homologues in Caenorhabditis elegans, including 60 with a one-to-one orthology relationship. According to annotations extracted from WormBase, the vast majority of these 60 worm genes have RNAi experimental data or mutant alleles, but manual inspection shows that only 15% have phenotypes that could be interpreted in terms of a specific function. One third of the worm orthologs have protein-protein interaction data, and two of these interactions are conserved in humans. The combination of phenotypic, protein-protein interaction, and gene expression data provides functional hypotheses for 8 uncharacterized human proteins. Experimental validation in human or orthologs is necessary before they can be considered for annotation.
- Published
- 2019
20. Proteomics Standards Initiative Extended FASTA Format
- Author
-
Peter R. Baker, Martin Eisenacher, Eric W. Deutsch, Tim Van Den Bossche, Robert J. Chalkley, Jim Shofstahl, Juan Antonio Vizcaíno, Luis Francisco Hernández Sánchez, Karl R. Clauser, Lydie Lane, Andrew Collins, Eugene A. Kapp, Sean L. Seymour, Gerhard Mayer, Pierre-Alain Binz, Luis Mendoza, Jimmy K. Eng, Gerben Menschaert, Yasset Perez-Riverol, Harald Barsnes, and Emanuele Alpi
- Subjects
0301 basic medicine ,Proteomics ,Biochemistry & Molecular Biology ,Computer science ,Information Storage and Retrieval ,PEFF ,Biochemistry ,Mass Spectrometry ,Article ,Proteomics Standards Initiative ,03 medical and health sciences ,Controlled vocabulary ,Humans ,PSI ,FASTA ,ddc:616 ,Information retrieval ,030102 biochemistry & molecular biology ,NeXtProt ,file formats ,FASTA format ,General Chemistry ,Biological Sciences ,File format ,Metadata ,PASTA ,030104 developmental biology ,Validator ,proteogenomics ,Chemical Sciences ,standards ,Generic health relevance ,UniProt ,Software ,Biotechnology - Abstract
Mass-spectrometry-based proteomics enables the high-throughput identification and quantification of proteins, including sequence variants and post-translational modifications (PTMs) in biological samples. However, most workflows require that such variations be included in the search space used to analyze the data, and doing so remains challenging with most analysis tools. In order to facilitate the search for known sequence variants and PTMs, the Proteomics Standards Initiative (PSI) has designed and implemented the PSI extended FASTA format (PEFF). PEFF is based on the very popular FASTA format but adds a uniform mechanism for encoding substantially more metadata about the sequence collection as well as individual entries, including support for encoding known sequence variants, PTMs, and proteoforms. The format is very nearly backward compatible, and as such, existing FASTA parsers will require little or no changes to be able to read PEFF files as FASTA files, although without supporting any of the extra capabilities of PEFF. PEFF is defined by a full specification document, controlled vocabulary terms, a set of example files, software libraries, and a file validator. Popular software and resources are starting to support PEFF, including the sequence search engine Comet and the knowledge bases neXtProt and UniProtKB. Widespread implementation of PEFF is expected to further enable proteogenomics and top-down proteomics applications by providing a standardized mechanism for encoding protein sequences and their known variations. All the related documentation, including the detailed file format specification and example files, are available at http://www.psidev.info/peff. acceptedVersion
- Published
- 2019
21. Why are they missing? : Bioinformatics characterization of missing human proteins
- Author
-
Amr Elguoshy, Fawzy El-Fiky, Yusuke Takisawa, Naohiko Kinoshita, Keiko Yamamoto, Sameh Magdeldin, Ali El-Refy, Ying Zhang, Bo Xu, Tadashi Yamamoto, Masaaki Nameta, and Yoshitoshi Hirao
- Subjects
0301 basic medicine ,Proteome ,In silico ,Biophysics ,Datasets as Topic ,Biology ,Bioinformatics ,Peptide Mapping ,Biochemistry ,03 medical and health sciences ,Combination strategy ,Humans ,Computer Simulation ,Trypsin ,Amino Acid Sequence ,Databases, Protein ,Human proteins ,NeXtProt ,Tryptic peptide ,Computational Biology ,Protein level ,Endopeptidase ,Transmembrane domain ,030104 developmental biology ,Peptides ,Hydrophobic and Hydrophilic Interactions - Abstract
NeXtProt is a web-based protein knowledge platform that supports research on human proteins. NeXtProt (release 2015-04-28) lists 20,060 proteins, among them, 3373 canonical proteins (16.8%) lack credible experimental evidence at protein level (PE2:PE5). Therefore, they are considered as "missing proteins". A comprehensive bioinformatic workflow has been proposed to analyze these "missing" proteins. The aims of current study were to analyze physicochemical properties, existence and distribution of the tryptic cleavage sites, and to pinpoint the signature peptides of the missing proteins. Our findings showed that 23.7% of missing proteins were hydrophobic proteins possessing transmembrane domains (TMD). Also, forty missing entries generate tryptic peptides were either out of mass detection range (30aa) or mapped to different proteins (9aa). Additionally, 21% of missing entries didn't generate any unique tryptic peptides. In silico endopeptidase combination strategy increased the possibility of missing proteins identification. Coherently, using both mature protein database and signal peptidome database could be a promising option to identify some missing proteins by targeting their unique N-terminal tryptic peptide from mature protein database and or C-terminus tryptic peptide from signal peptidome database. In conclusion, Identification of missing protein requires additional consideration during sample preparation, extraction, digestion and data analysis to increase its incidence of identification.
- Published
- 2016
22. Metrics for the Human Proteome Project 2016: Progress on Identifying and Characterizing the Human Proteome, Including Post-Translational Modifications
- Author
-
Lydie Lane, Christopher M. Overall, Eric W. Deutsch, Ronald C. Beavis, Gilbert S. Omenn, and Emma Lundberg
- Subjects
Proteomics ,0301 basic medicine ,Proteome ,Human Protein Atlas ,Guidelines as Topic ,Genomics ,Biology ,Polymorphism, Single Nucleotide ,Biochemistry ,Article ,Mass Spectrometry ,03 medical and health sciences ,Human proteome project ,Humans ,Protein Isoforms ,ddc:576 ,Databases, Protein ,NeXtProt ,General Chemistry ,Data science ,Compendium ,030104 developmental biology ,Disease Susceptibility ,PeptideAtlas ,Protein Processing, Post-Translational - Abstract
The HUPO Human Proteome Project (HPP) has two overall goals: (1) stepwise completion of the protein parts list-the draft human proteome including confidently identifying and characterizing at least one protein product from each protein-coding gene, with increasing emphasis on sequence variants, post-translational modifications (PTMs), and splice isoforms of those proteins; and (2) making proteomics an integrated counterpart to genomics throughout the biomedical and life sciences community. PeptideAtlas and GPMDB reanalyze all major human mass spectrometry data sets available through ProteomeXchange with standardized protocols and stringent quality filters; neXtProt curates and integrates mass spectrometry and other findings to present the most up to date authorative compendium of the human proteome. The HPP Guidelines for Mass Spectrometry Data Interpretation version 2.1 were applied to manuscripts submitted for this 2016 C-HPP-led special issue [ www.thehpp.org/guidelines ]. The Human Proteome presented as neXtProt version 2016-02 has 16,518 confident protein identifications (Protein Existence [PE] Level 1), up from 13,664 at 2012-12, 15,646 at 2013-09, and 16,491 at 2014-10. There are 485 proteins that would have been PE1 under the Guidelines v1.0 from 2012 but now have insufficient evidence due to the agreed-upon more stringent Guidelines v2.0 to reduce false positives. neXtProt and PeptideAtlas now both require two non-nested, uniquely mapping (proteotypic) peptides of at least 9 aa in length. There are 2,949 missing proteins (PE2+3+4) as the baseline for submissions for this fourth annual C-HPP special issue of Journal of Proteome Research. PeptideAtlas has 14,629 canonical (plus 1187 uncertain and 1755 redundant) entries. GPMDB has 16,190 EC4 entries, and the Human Protein Atlas has 10,475 entries with supportive evidence. neXtProt, PeptideAtlas, and GPMDB are rich resources of information about post-translational modifications (PTMs), single amino acid variants (SAAVSs), and splice isoforms. Meanwhile, the Biology- and Disease-driven (B/D)-HPP has created comprehensive SRM resources, generated popular protein lists to guide targeted proteomics assays for specific diseases, and launched an Early Career Researchers initiative.
- Published
- 2016
23. Integrated Proteomic Pipeline Using Multiple Search Engines for a Proteogenomic Study with a Controlled Protein False Discovery Rate
- Author
-
Ji Yeong Park, Hyoung Joo Lee, Eun Sun Ji, John R. Yates, Young Mok Park, Hyun Kyoung Lee, Kyung Hoon Kwon, Kwang Hoe Kim, Gun Wook Park, Heeyoun Hwang, Jin Young Kim, Sung Kyu Robin Park, Jong Shin Yoo, Ju Yeon Lee, and Young Ki Paik
- Subjects
Proteomics ,0301 basic medicine ,False discovery rate ,Biology ,computer.software_genre ,Hippocampus ,Biochemistry ,Mass Spectrometry ,03 medical and health sciences ,Search engine ,Mascot ,Human proteome project ,Humans ,False Positive Reactions ,Databases, Protein ,Proteogenomics ,030102 biochemistry & molecular biology ,NeXtProt ,Proteomic Profiling ,Computational Biology ,General Chemistry ,Pipeline (software) ,Search Engine ,Alternative Splicing ,030104 developmental biology ,Data mining ,computer - Abstract
In the Chromosome-Centric Human Proteome Project (C-HPP), false-positive identification by peptide spectrum matches (PSMs) after database searches is a major issue for proteogenomic studies using liquid-chromatography and mass-spectrometry-based large proteomic profiling. Here we developed a simple strategy for protein identification, with a controlled false discovery rate (FDR) at the protein level, using an integrated proteomic pipeline (IPP) that consists of four engrailed steps as follows. First, using three different search engines, SEQUEST, MASCOT, and MS-GF+, individual proteomic searches were performed against the neXtProt database. Second, the search results from the PSMs were combined using statistical evaluation tools including DTASelect and Percolator. Third, the peptide search scores were converted into E-scores normalized using an in-house program. Last, ProteinInferencer was used to filter the proteins containing two or more peptides with a controlled FDR of 1.0% at the protein level. Finally, we compared the performance of the IPP to a conventional proteomic pipeline (CPP) for protein identification using a controlled FDR of1% at the protein level. Using the IPP, a total of 5756 proteins (vs 4453 using the CPP) including 477 alternative splicing variants (vs 182 using the CPP) were identified from human hippocampal tissue. In addition, a total of 10 missing proteins (vs 7 using the CPP) were identified with two or more unique peptides, and their tryptic peptides were validated using MS/MS spectral pattern from a repository database or their corresponding synthetic peptides. This study shows that the IPP effectively improved the identification of proteins, including alternative splicing variants and missing proteins, in human hippocampal tissues for the C-HPP. All RAW files used in this study were deposited in ProteomeXchange (PXD000395).
- Published
- 2016
24. Towards a functional definition of the mitochondrial human proteome
- Author
-
Emma Lundberg, Andrea Urbani, Tiziana Alberio, Mohan Babu, and Mauro Fasano
- Subjects
0301 basic medicine ,NeXtProt ,lcsh:QH426-470 ,Special Section: Proceedings of the 9th Annual EuPA Congress “Proteomics - Back to the Future” (June 23 - 28, 2015, Milano, Italy) ,Computational biology ,Biology ,Mitochondrion ,Proteomics ,Biochemistry ,Mitochondrial proteome ,Cell biology ,03 medical and health sciences ,lcsh:Genetics ,030104 developmental biology ,Human proteome project ,Mitochondrial biology ,Mitochondrial protein ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
Graphical abstract, Highlights The mitochondrial proteome functionally include cytoplasmic proteins. Mitochondrial proteomics studies may be mapped on a reference functional network. The mitochondrial proteome is a dynamic and chronosteric reality., The mitochondrial human proteome project (mt-HPP) was initiated by the Italian HPP group as a part of both the chromosome-centric initiative (C-HPP) and the biology and disease driven initiative (B/D-HPP). In recent years several reports highlighted how mitochondrial biology and disease are regulated by specific interactions with non-mitochondrial proteins. Thus, it is of great relevance to extend our present view of the mitochondrial proteome not only to those proteins that are encoded by or transported to mitochondria, but also to their interactors that take part in mitochondria functionality. Here, we propose a graphical representation of the functional mitochondrial proteome by retrieving mitochondrial proteins from the NeXtProt database and adding to the network their interactors as annotated in the IntAct database. Notably, the network may represent a reference to map all the proteins that are currently being identified in mitochondrial proteomics studies.
- Published
- 2016
25. Informatics View on the Challenges of Identifying Missing Proteins from Shotgun Proteomics
- Author
-
Ting-Yi Sung, Hui Yin Chang, Wai-Kok Choong, Ching-Tai Chen, Yu-Ju Chen, Chia-Feng Tsai, and Wen-Lian Hsu
- Subjects
Proteomics ,InterPro ,Proteome ,Annexins ,In silico ,Molecular Sequence Data ,Computational biology ,Biology ,Receptors, Odorant ,Biochemistry ,Mass Spectrometry ,Human proteome project ,Humans ,Computer Simulation ,Amino Acid Sequence ,Databases, Protein ,Shotgun proteomics ,Peptide sequence ,NeXtProt ,Computational Biology ,Genetic Variation ,Molecular Sequence Annotation ,General Chemistry ,Molecular biology ,Peptide Fragments ,Proteolysis ,Hydrophobic and Hydrophilic Interactions - Abstract
Protein experiment evidence at protein level from mass spectrometry and antibody experiments are essential to characterize the human proteome. neXtProt (2014-09 release) reported 20 055 human proteins, including 16 491 proteins identified at protein level and 3564 proteins unidentified. Excluding 616 proteins at uncertain level, 2948 proteins were regarded as missing proteins. Missing proteins were unidentified partially due to MS limitations and intrinsic properties of proteins, for example, only appearing in specific diseases or tissues. Despite such reasons, it is desirable to explore issues affecting validation of missing proteins from an "ideal" shotgun analysis of human proteome. We thus performed in silico digestions on the human proteins to generate all in silico fully digested peptides. With these presumed peptides, we investigated the identification of proteins without any unique peptide, the effect of sequence variants on protein identification, difficulties in identifying olfactory receptors, and highly similar proteins. Among all proteins with evidence at transcript level, G protein-coupled receptors and olfactory receptors, based on InterPro classification, were the largest families of proteins and exhibited more frequent variants. To identify missing proteins, the above analyses suggested including sequence variants in protein FASTA for database searching. Furthermore, evidence of unique peptides identified from MS experiments would be crucial for experimentally validating missing proteins.
- Published
- 2015
26. Chromosome-Based Proteomic Study for Identifying Novel Protein Variants from Human Hippocampal Tissue Using Customized neXtProt and GENCODE Databases
- Author
-
Heeyoun Hwang, Kwang Hoe Kim, Ju Yeon Lee, Hyoung Joo Lee, Young Mok Park, Jin Young Kim, Gun Wook Park, Kyung Hoon Kwon, Hyun Kyoung Lee, Eun Sun Ji, Jong Shin Yoo, Sung Kyu Robin Park, Tao Xu, Young Ki Paik, and John R. Yates
- Subjects
Proteomics ,Nonsynonymous substitution ,Molecular Sequence Data ,Single-nucleotide polymorphism ,Biology ,computer.software_genre ,Hippocampus ,Polymorphism, Single Nucleotide ,Biochemistry ,Workflow ,Alzheimer Disease ,Tandem Mass Spectrometry ,Databases, Genetic ,Human proteome project ,Chromosomes, Human ,Humans ,Amino Acid Sequence ,Databases, Protein ,Peptide sequence ,Genetics ,Epilepsy ,NeXtProt ,Database ,GENCODE ,Alternative splicing ,Genetic Variation ,General Chemistry ,Alternative Splicing ,Case-Control Studies ,computer ,Software ,Chromatography, Liquid - Abstract
The goal of the Chromosome-Centric Human Proteome Project (C-HPP) is to fully provide proteomic information from each human chromosome, including novel proteoforms, such as novel protein-coding variants expressed from noncoding genomic regions, alternative splicing variants (ASVs), and single amino acid variants (SAAVs). In the 144 LC/MS/MS raw files from human hippocampal tissues of control, epilepsy, and Alzheimer's disease, we identified the novel proteoforms with a workflow including integrated proteomic pipeline using three different search engines, MASCOT, SEQUEST, and MS-GF+. With a
- Published
- 2015
27. Appraisal of the Missing Proteins Based on the mRNAs Bound to Ribosomes
- Author
-
Xun Xu, Quanhui Wang, Baojin Zhou, Guixue Hou, Yamei Deng, Siqi Liu, Liang Lin, Zhilong Lin, Jin Zi, Xin Liu, Shaohang Xu, Bo Wen, Zhe Ren, and Ruo Zhou
- Subjects
Proteomics ,RNA-Seq ,Computational biology ,Biology ,Bioinformatics ,Biochemistry ,Ribosome ,Tandem Mass Spectrometry ,Cell Line, Tumor ,Deoxyribonuclease I ,Humans ,RNA, Messenger ,Gene ,Messenger RNA ,NeXtProt ,Sequence Analysis, RNA ,Liver Neoplasms ,Computational Biology ,Proteins ,General Chemistry ,Chromatin ,Gene Ontology ,Cell culture ,Protein Biosynthesis ,Proteome ,Hydrophobic and Hydrophilic Interactions ,Ribosomes - Abstract
Considering the technical limitations of mass spectrometry in protein identification, the mRNAs bound to ribosomes (RNC-mRNA) are assumed to reflect the mRNAs participating in the translational process. The RNC-mRNA data are reasoned to be useful for appraising the missing proteins. A set of the multiomics data including free-mRNAs, RNC-mRNAs, and proteomes was acquired from three liver cancer cell lines. On the basis of the missing proteins in neXtProt (release 2014-09-19), the bioinformatics analysis was carried out in three phases: (1) finding how many neXtProt missing proteins have or do not have RNA-seq and/or MS/MS evidence, (2) analyzing specific physicochemical and biological properties of the missing proteins that lack both RNA-seq and MS/MS evidence, and (3) analyzing the combined properties of these missing proteins. Total of 1501 missing proteins were found by neither RNC-mRNA nor MS/MS in the three liver cancer cell lines. For these missing proteins, some are expected higher hydrophobicity, unsuitable detection, or sensory functions as properties at the protein level, while some are predicted to have nonexpressing chromatin structures on the corresponding gene level. With further integrated analysis, we could attribute 93% of them (1391/1501) to these causal factors, which result in the expression products scarcely detected by RNA-seq or MS/MS.
- Published
- 2015
28. Chromosome 17 Missing Proteins: Recent Progress and Future Directions as Part of the neXt-MP50 Challenge
- Author
-
Hongjiu Zhang, Gilbert S. Omenn, Omer Siddiqui, and Yuanfang Guan
- Subjects
0301 basic medicine ,Proteome ,In silico ,Human Protein Atlas ,Computational biology ,Biology ,Biochemistry ,Mass Spectrometry ,Article ,03 medical and health sciences ,0302 clinical medicine ,Chromosome (genetic algorithm) ,medicine ,Human proteome project ,Methods ,Humans ,Computer Simulation ,Protein Interaction Maps ,Olfactory receptor ,NeXtProt ,Proteins ,General Chemistry ,Chromosome 17 (human) ,030104 developmental biology ,medicine.anatomical_structure ,PeptideAtlas ,030217 neurology & neurosurgery ,Chromosomes, Human, Pair 17 - Abstract
The Chromosome-centric Human Proteome Project (C-HPP), announced in September 2016, is an initiative to accelerate progress on the detection and characterization of neXtProt PE2,3,4 "missing proteins" (MPs) with a mandate to each chromosome team to find about 50 MPs over 2 years. Here we report major progress toward the neXt-MP50 challenge with 43 newly validated Chr 17 PE1 proteins, of which 25 were based on mass spectrometry, 12 on protein-protein interactions, 3 on a combination of MS and PPI, and 3 with other types of data. Notable among these new PE1 proteins were five keratin-associated proteins, a single olfactory receptor, and five additional membrane-embedded proteins. We evaluate the prospects of finding the remaining 105 MPs coded for on Chr 17, focusing on mass spectrometry and protein-protein interaction approaches. We present a list of 35 prioritized MPs with specific approaches that may be used in further MS and PPI experimental studies. Additionally, we demonstrate how in silico studies can be used to capture individual peptides from major data repositories, documenting one MP that appears to be a strong candidate for PE1. We are close to our goal of finding 50 MPs for Chr 17.
- Published
- 2018
29. Structure and Protein Interaction-Based Gene Ontology Annotations Reveal Likely Functions of Uncharacterized Proteins on Human Chromosome 17
- Author
-
Gilbert S. Omenn, Xiaoqiong Wei, Chengxin Zhang, and Yang Zhang
- Subjects
0301 basic medicine ,030102 biochemistry & molecular biology ,NeXtProt ,Proteome ,Proteins ,Molecular Sequence Annotation ,General Chemistry ,Computational biology ,Protein structure prediction ,Biology ,Biochemistry ,Article ,Chromosome 17 (human) ,03 medical and health sciences ,030104 developmental biology ,Gene Ontology ,Chromosome (genetic algorithm) ,Human proteome project ,Humans ,Target protein ,Databases, Protein ,Gene ,Function (biology) ,Chromosomes, Human, Pair 17 - Abstract
Understanding the function of human proteins is essential to decipher the molecular mechanisms of human diseases and phenotypes. Of the 17 470 human protein coding genes in the neXtProt 2018-01-17 database with unequivocal protein existence evidence (PE1), 1260 proteins do not have characterized functions. To reveal the function of poorly annotated human proteins, we developed a hybrid pipeline that creates protein structure prediction using I-TASSER and infers functional insights for the target protein from the functional templates recognized by COFACTOR. As a case study, the pipeline was applied to all 66 PE1 proteins with unknown or insufficiently specific function (uPE1) on human chromosome 17 as of neXtProt 2017-07-01. Benchmark testing on a control set of 100 well-characterized proteins randomly selected from the same chromosome shows high Gene Ontology (GO) term prediction accuracies of 0.69, 0.57, and 0.67 for molecular function (MF), biological process (BP), and cellular component (CC), respectively. Three pipelines of function annotations (homology detection, protein-protein interaction network inference, and structure template identification) have been exploited by COFACTOR. Detailed analyses show that structure template detection based on low-resolution protein structure prediction made the major contribution to the enhancement of the sensitivity and precision of the annotation predictions, especially for cases that do not have sequence-level homologous templates. For the chromosome 17 uPE1 proteins, the I-TASSER/COFACTOR pipeline confidently assigned MF, BP, and CC for 13, 33, and 49 proteins, respectively, with predicted functions ranging from sphingosine N-acyltransferase activity and sugar transmembrane transporter to cytoskeleton constitution. We highlight the 13 proteins with confident MF predictions; 11 of these are among the 33 proteins with confident BP predictions and 12 are among the 49 proteins with confident CC. This study demonstrates a novel computational approach to systematically annotate protein function in the human proteome and provides useful insights to guide experimental design and follow-up validation studies of these uncharacterized proteins.
- Published
- 2018
30. Subcellular Proteome Landscape of Human Embryonic Stem Cells Revealed Missing Membrane Proteins
- Author
-
Chia-Li Han, Ching-Yu Chuang, Mehari Muuz Weldemariam, Hung-Chih Kuo, Ghasem Hosseini Salekdeh, Faezeh Shekari, Yu-Ju Chen, Wai-Kok Choong, Reta Birhanu Kitata, Maxey C. M. Chung, Ting-Yi Sung, Wei-Ting Hsu, and Fuchu He
- Subjects
0301 basic medicine ,Proteomics ,Proteome ,Cell ,Human Embryonic Stem Cells ,Biology ,Biochemistry ,03 medical and health sciences ,medicine ,Humans ,Cell Lineage ,030102 biochemistry & molecular biology ,NeXtProt ,Membrane Proteins ,Cell Differentiation ,General Chemistry ,Intracellular Membranes ,Embryonic stem cell ,Cell biology ,030104 developmental biology ,medicine.anatomical_structure ,Membrane protein ,Cytoplasm ,embryonic structures ,Cell fractionation - Abstract
Human embryonic stem cells (hESCs) have the capacity for self-renewal and multilineage differentiation, which are of clinical importance for regeneration medicine. Despite the significant progress of hESC study, the complete hESC proteome atlas, especially the surface protein composition, awaits delineation. According to the latest release of neXtProt database (January 17, 2018; 19 658 PE1, 2, 3, and 4 human proteins), membrane proteins present the major category (1047; 48%) among all 2186 missing proteins (MPs). We conducted a deep subcellular proteomics analysis of hESCs to identify the nuclear, cytoplasmic, and membrane proteins in hESCs and to mine missing membrane proteins in the very early cell status. To our knowledge, our study achieved the largest data set with confident identification of 11 970 unique proteins (1% false discovery rate at peptide, protein, and PSM levels), including the most-comprehensive description of 6 138 annotated membrane proteins in hESCs. Following the HPP guideline, we identified 26 gold (neXtProt PE2, 3, and 4 MPs) and 87 silver (potential MP candidates with a single unique peptide detected) MPs, of which 69 were membrane proteins, and the expression of 21 gold MPs was further verified either by multiple reaction monitoring mass spectrometry or by matching synthetic peptides in the Peptide Atlas database. Functional analysis of the MPs revealed their potential roles in the pluripotency-related pathways and the lineage- and tissue-specific differentiation processes. Our proteome map of hESCs may provide a rich resource not only for the identification of MPs in the human proteome but also for the investigation on self-renewal and differentiation of hESC. All mass spectrometry data were deposited in ProteomeXchange via jPOST with identifier PXD009840.
- Published
- 2018
31. Identification of Missing Proteins in Human Olfactory Epithelial Tissue by Liquid Chromatography-Tandem Mass Spectrometry
- Author
-
Gi Taek Yee, Hyun Joo An, Heeyoun Hwang, Hyun Kyoung Lee, Ki Na Yun, Bonghee Lee, Ji Eun Jeong, Tae Seok Jeong, Jin Young Kim, Young Ki Paik, and Jong Shin Yoo
- Subjects
0301 basic medicine ,Proteomics ,Computational biology ,01 natural sciences ,Biochemistry ,03 medical and health sciences ,Olfactory mucosa ,Olfactory Mucosa ,Liquid chromatography–mass spectrometry ,medicine ,Humans ,Amino Acid Sequence ,Peptide sequence ,Olfactory receptor ,NeXtProt ,GENCODE ,Chemistry ,010401 analytical chemistry ,Alternative splicing ,Genetic Variation ,General Chemistry ,0104 chemical sciences ,Alternative Splicing ,030104 developmental biology ,medicine.anatomical_structure ,Peptides - Abstract
We performed proteomic analyses of human olfactory epithelial tissue to identify missing proteins using liquid chromatography–tandem mass spectrometry. Using a next-generation proteomic pipeline with a < 1.0% false discovery rate at the peptide and protein levels, we identified 3731 proteins, among which five were missing proteins (P0C7M7, P46721, P59826, Q658L1, and Q8N434). We validated the identified missing proteins using the corresponding synthetic peptides. No olfactory receptor (OR) proteins were detected in olfactory tissue, suggesting that detection of ORs would be very difficult. We also identified 49 and 50 alternative splicing variants mapped at the neXtProt and GENCODE databases, respectively, and 2000 additional single amino acid variants. This data set is available at the ProteomeXchange consortium via PRIDE repository (PXD010025).
- Published
- 2018
32. Launching the C-HPP neXt-CP50 pilot project for functional characterization of identified proteins with no known function
- Author
-
Fernando J. Corrales, Je-Yoel Cho, Lydie Lane, Jin Young Cho, Young Ki Paik, Sergio Encarnación-Guevara, Jong Shin Yoo, Siqi Lui, Gilbert S. Omenn, Yu-Ju Chen, Chae Yeon Kim, Takeshi Kawamura, Alexander I. Archakov, Joshua LaBaer, Ghasem Hosseini Salekdeh, Gilberto B. Domont, and Christopher M. Overall
- Subjects
0301 basic medicine ,ddc:616 ,030102 biochemistry & molecular biology ,NeXtProt ,Online database ,General Chemistry ,Computational biology ,Biology ,Biochemistry ,Homology (biology) ,03 medical and health sciences ,Open reading frame ,Annotation ,030104 developmental biology ,Proteome ,Human proteome project ,Human genome - Abstract
An important goal of the Human Proteome Organization (HUPO) Chromosome-centric Human Proteome Project (C-HPP) is to correctly define the number of canonical proteins encoded by their cognate open reading frames on each chromosome in the human genome. When identified with high confidence of protein evidence (PE), such proteins are termed PE1 proteins in the online database resource, neXtProt. However, proteins that have not been identified unequivocally at the protein level but that have other evidence suggestive of their existence (PE2-4) are termed missing proteins (MPs). The number of MPs has been reduced from 5511 in 2012 to 2186 in 2018 (neXtProt 2018-01-17 release). Although the annotation of the human proteome has made significant progress, the "parts list" alone does not inform function. Indeed, 1937 proteins representing ∼10% of the human proteome have no function either annotated from experimental characterization or predicted by homology to other proteins. Specifically, these 1937 "dark proteins" of the so-called dark proteome are composed of 1260 functionally uncharacterized but identified PE1 proteins, designated as uPE1, plus 677 MPs from categories PE2-PE4, which also have no known or predicted function and are termed uMPs. At the HUPO-2017 Annual Meeting, the C-HPP officially adopted the uPE1 pilot initiative, with 14 participating international teams later committing to demonstrate the feasibility of the functional characterization of large numbers of dark proteins (CP), starting first with 50 uPE1 proteins, in a stepwise chromosome-centric organizational manner. The second aim of the feasibility phase to characterize protein (CP) functions of 50 uPE1 proteins, termed the neXt-CP50 initiative, is to utilize a variety of approaches and workflows according to individual team expertise, interest, and resources so as to enable the C-HPP to recommend experimentally proven workflows to the proteome community within 3 years. The results from this pilot will not only be the cornerstone of a larger characterization initiative but also enhance understanding of the human proteome and integrated cellular networks for the discovery of new mechanisms of pathology, mechanistically informative biomarkers, and rational drug targets.
- Published
- 2018
33. Update of the functional mitochondrial human proteome network
- Author
-
Lydie Lane, Chiara Monti, Mauro Fasano, and Tiziana Alberio
- Subjects
0301 basic medicine ,Proteomics ,Proteomics methods ,Proteome ,Computational biology ,Biology ,Mitochondrion ,Biochemistry ,Protein–protein interaction ,protein-protein interaction ,Mitochondrial Proteins ,03 medical and health sciences ,0302 clinical medicine ,Human proteome project ,neXtProt ,Humans ,Protein Interaction Maps ,Mitochondrial protein ,ddc:616 ,NeXtProt ,Chemistry (all) ,General Chemistry ,Mitochondria ,030104 developmental biology ,mitochondria ,Mitochondrial Human Proteome Project ,network ,Reference database ,030217 neurology & neurosurgery ,Protein Interaction Map - Abstract
Because of the pivotal role of mitochondrial alterations in several diseases, the Human Proteome Organization (HUPO) has promoted in recent years an initiative to characterize the mitochondrial human proteome, the mitochondrial human proteome project (mt-HPP). Here we generated an updated version of the functional mitochondrial human proteome network, made by nodes (mitochondrial proteins) and edges (gold binary interactions), using data retrieved from neXtProt, the reference database for HPP metrics. The principal new concept suggested was the consideration of mitochondria-associated proteins (first interactors), which may influence mitochondrial functions. All of the proteins described as mitochondrial in the sublocation or the GO Cellular Component sections of neXtProt were considered. Their other subcellular and submitochondrial localizations have been analyzed. The network represents the effort to collect all of the high-quality binary interactions described so far for mitochondrial proteins and the possibility for the community to reuse the information collected. As a proof of principle, we mapped proteins with no function, to speculate on their role by the background knowledge of their interactors, and proteins described to be involved in Parkinson's Disease, a neurodegenerative disorder, where it is known that mitochondria play a central role.
- Published
- 2018
34. Exploring the Uncharacterized Human Proteome Using neXtProt
- Author
-
Lydie Lane, Alain Gateau, Paula D. Duek, and Amos Marc Bairoch
- Subjects
0301 basic medicine ,ddc:616 ,NeXtProt ,Proteome ,Genome, Human ,Systems biology ,Computational Biology ,Molecular Sequence Annotation ,General Chemistry ,Computational biology ,Biology ,Biochemistry ,Chromatin ,03 medical and health sciences ,Annotation ,030104 developmental biology ,0302 clinical medicine ,Functional annotation ,Human proteome project ,Methods ,Data Mining ,Humans ,Human genome ,Gene ,030217 neurology & neurosurgery - Abstract
20,230 protein-coding genes have been predicted from the analysis of the human genome (neXtProt release 2018-01-17), and about 10% of them are still lacking functional annotation, either predicted by bioinformatics tools or captured from experimental reports. A systematic exploration of the available literature on uncharacterized human genes/proteins led to proposal of functional annotations for 113 proteins and to consolidation of a list of 1,862 uncharacterized human proteins. The advanced search functionality of neXtProt was used extensively in order to examine the landscape of the uncharacterized human proteome in terms of subcellular locations, protein-protein interactions, tissue expression, association with diseases, and 3D structure. Finally, a deep data mining in various publicly available resources allowed building functional hypotheses for 26 uncharacterized human proteins validated at protein level (uPE1). These hypotheses cover the fields of cilia biology, male reproduction, metabolism, nervous system, immunity, inflammation, RNA metabolism, and chromatin biology. They will require experimental validation before they can be considered for annotation. Despite technological progresses, the pace of human protein characterization studies is still slow. It could be accelerated by a better integration of existing knowledge resources and by initiating large collaborative projects involving specialists of different biology fields. We hope that our analysis will contribute to set up the ground for such collaborative approaches and will be exploited by the HUPO Human Proteome Project teams committed to characterize uPE1 proteins.
- Published
- 2018
35. Functional Networks of Highest-Connected Splice Isoforms: From The Chromosome 17 Human Proteome Project
- Author
-
Hong-Dong Li, Gilbert S. Omenn, Yang Zhang, Rajasree Menon, Bharat Panwar, Yuanfang Guan, and Brandon Govindarajoo
- Subjects
Genetics ,Regulation of gene expression ,Gene isoform ,Proteome ,NeXtProt ,Sequence Analysis, RNA ,Alternative splicing ,Proteins ,General Chemistry ,Biology ,Biochemistry ,Article ,Alternative Splicing ,Mice ,Human proteome project ,Animals ,Humans ,Protein Isoforms ,RNA, Messenger ,Gene ,Function (biology) ,Chromosomes, Human, Pair 17 - Abstract
Alternative splicing allows a single gene to produce multiple transcript-level splice isoforms from which the translated proteins may show differences in their expression and function. Identifying the major functional or canonical isoform is important for understanding gene and protein functions. Identification and characterization of splice isoforms is a stated goal of the HUPO Human Proteome Project and of neXtProt. Multiple efforts have catalogued splice isoforms as "dominant", "principal", or "major" isoforms based on expression or evolutionary traits. In contrast, we recently proposed highest connected isoforms (HCIs) as a new class of canonical isoforms that have the strongest interactions in a functional network and revealed their significantly higher (differential) transcript-level expression compared to nonhighest connected isoforms (NCIs) regardless of tissues/cell lines in the mouse. HCIs and their expression behavior in the human remain unexplored. Here we identified HCIs for 6157 multi-isoform genes using a human isoform network that we constructed by integrating a large compendium of heterogeneous genomic data. We present examples for pairs of transcript isoforms of ABCC3, RBM34, ERBB2, and ANXA7. We found that functional networks of isoforms of the same gene can show large differences. Interestingly, differential expression between HCIs and NCIs was also observed in the human on an independent set of 940 RNA-seq samples across multiple tissues, including heart, kidney, and liver. Using proteomic data from normal human retina and placenta, we showed that HCIs are a promising indicator of expressed protein isoforms exemplified by NUDFB6 and M6PR. Furthermore, we found that a significant percentage (20%, p = 0.0003) of human and mouse HCIs are homologues, suggesting their conservation between species. Our identified HCIs expand the repertoire of canonical isoforms and are expected to facilitate studying main protein products, understanding gene regulation, and possibly evolution. The network is available through our web server as a rich resource for investigating isoform functional relationships (http://guanlab.ccmb.med.umich.edu/hisonet). All MS/MS data were available at ProteomeXchange Web site (http://www.proteomexchange.org) through their identifiers (retina: PXD001242, placenta: PXD000754).
- Published
- 2015
36. Finding Missing Proteins from the Epigenetically Manipulated Human Cell with Stringent Quality Criteria
- Author
-
Pengyuan Yang, Yang Chen, Qing-Yu He, Qing Wang, Jie Guo, Xinlei Lian, Gong Zhang, Xing-Feng Yin, Tong Wang, Wanling Zhang, Yaxing Li, Lijuan Yang, and Fei Lan
- Subjects
Genetics ,NeXtProt ,biology ,Molecular Sequence Data ,Proteins ,General Chemistry ,Computational biology ,Methylation ,Biochemistry ,In vitro ,Epigenesis, Genetic ,Histone ,Acetylation ,Cell Line, Tumor ,biology.protein ,Human proteome project ,Humans ,Amino Acid Sequence ,Epigenetics ,Gene - Abstract
The chromosome-centric human proteome project (C-HPP) has made great progress of finding protein evidence (PE) for missing proteins (PE2-4 proteins defined by the neXtProt), which now becomes an increasingly challenging field. As a majority of samples tested in this field were from adult tissues/cells, the developmental stage specific or relevant proteins could be missed due to biological source availability. We posit that epigenetic interventions may help to partially bypass such a limitation by stimulating the expression of the "silenced" genes in adult cells, leading to the increased chance of finding missing proteins. In this study, we established in vitro human cell models to modify the histone acetylation, demethylation, and methylation with near physiological conditions. With mRNA-seq analysis, we found that histone modifications resulted in overall increases of expressed genes in an even distribution manner across different chromosomes. We identified 64 PE2-4 and six PE5 proteins by MaxQuant (FDR1% at both protein and peptide levels) and 44 PE2-4 and 7 PE5 proteins by Mascot (FDR1% at peptide level) searches, respectively. However, only 24 PE2-4 and five PE5 proteins in Mascot, and 12 PE2-4 and one PE5 proteins in MaxQuant searches could, respectively, pass our stringently manual spectrum inspections. Collectively, 27 PE2-4 and five PE5 proteins were identified from the epigenetically modified cells; among them, 19 PE2-4 and three PE5 proteins passed FDR1% at both peptide and protein levels. Gene ontology analyses revealed that the PE2-4 proteins were significantly involved in development and spermatogenesis, although their chemical-physical features had no statistical difference from the background. In addition, we presented an example of suspicious PE5 peptide spectrum matched with unusual AA substitutions related to post-translational modification. In conclusion, the epigenetically manipulated cell models should be a useful tool for finding missing proteins in C-HPP. The mass spectrometry data have been deposited to the iProx database (accession number: IPX00020200).
- Published
- 2015
37. State of the Human Proteome in 2014/2015 As Viewed through PeptideAtlas: Enhancing Accuracy and Coverage through the AtlasProphet
- Author
-
David Shteynberg, Eric W. Deutsch, Luis Mendoza, Robert L. Moritz, David S. Campbell, Zhi Sun, Ulrike Kusebauch, Caroline S. Chu, and Gilbert S. Omenn
- Subjects
Proteomics ,Molecular Sequence Data ,Biology ,computer.software_genre ,Biochemistry ,Article ,Set (abstract data type) ,03 medical and health sciences ,False positive paradox ,Human proteome project ,Humans ,Amino Acid Sequence ,Databases, Protein ,Shotgun proteomics ,030304 developmental biology ,0303 health sciences ,Sequence Homology, Amino Acid ,NeXtProt ,030302 biochemistry & molecular biology ,Proteins ,General Chemistry ,Amino Acid Substitution ,Proteome ,Data mining ,PeptideAtlas ,computer - Abstract
The Human PeptideAtlas is a compendium of the highest quality peptide identifications from over 1000 shotgun mass spectrometry proteomics experiments collected from many different labs, all reanalyzed through a uniform processing pipeline. The latest 2015-03 build contains substantially more input data than past releases, is mapped to a recent version of our merged reference proteome, and uses improved informatics processing and the development of the AtlasProphet to provide the highest quality results. Within the set of ~20,000 neXtProt primary entries, 14,070 (70%) are confidently detected in the latest build, 5% are ambiguous, 9% are redundant, leaving the total percentage of proteins for which there are no mapping detections at just 16% (3166), all derived from over 133 million peptide-spectrum matches identifying more than 1 million distinct peptides using AtlasProphet to characterize and classify the protein matches. Improved handling for detection and presentation of single amino-acid variants (SAAVs) reveals the detection of 5,326 uniquely mapping SAAVs across 2,794 proteins. With such a large amount of data, the control of false positives is a challenge. We present the methodology and results for maintaining rigorous quality, along with a discussion of the implications of the remaining sources of errors in the build. We check our uncertainty estimates against a set of olfactory receptor proteins not expected to be present in the set. We show how the use of synthetic reference spectra can provide confirmatory evidence for claims of detection of proteins with weak evidence.
- Published
- 2015
38. Special Enrichment Strategies Greatly Increase the Efficiency of Missing Proteins Identification from Regular Proteome Samples
- Author
-
Bei Zhen, Siqi Liu, Miaomiao Tian, Na Sensang, Lingsheng Chen, Zhi Qiang Wang, Yunping Zhu, Yuan Gao, Mingzhi Zhao, Yao Zhang, Feilin Wu, Na Su, Tao Zhang, Fuchu He, Fengxu Fan, Bo Wen, Songfeng Wu, Zhi Xiong, Ping Xu, Yanchang Li, Pengyuan Yang, and Chengpu Zhang
- Subjects
Adult ,Aged, 80 and over ,Male ,Genetics ,Proteome ,NeXtProt ,Proteins ,nutritional and metabolic diseases ,General Chemistry ,Computational biology ,Middle Aged ,Biology ,Biochemistry ,Cell Line ,Membrane protein ,Tandem Mass Spectrometry ,Human proteome project ,Humans ,Female ,Aged - Abstract
As part of the Chromosome-Centric Human Proteome Project (C-HPP) mission, laboratories all over the world have tried to map the entire missing proteins (MPs) since 2012. On the basis of the first and second Chinese Chromosome Proteome Database (CCPD 1.0 and 2.0) studies, we developed systematic enrichment strategies to identify MPs that fell into four classes: (1) low molecular weight (LMW) proteins, (2) membrane proteins, (3) proteins that contained various post-translational modifications (PTMs), and (4) nucleic acid-associated proteins. Of 8845 proteins identified in 7 data sets, 79 proteins were classified as MPs. Among data sets derived from different enrichment strategies, data sets for LMW and PTM yielded the most novel MPs. In addition, we found that some MPs were identified in multiple-data sets, which implied that tandem enrichments methods might improve the ability to identify MPs. Moreover, low expression at the transcription level was the major cause of the "missing" of these MPs; however, MPs with higher expression level also evaded identification, most likely due to other characteristics such as LMW, high hydrophobicity and PTM. By combining a stringent manual check of the MS2 spectra with peptides synthesis verification, we confirmed 30 MPs (neXtProt PE2 ∼ PE4) and 6 potential MPs (neXtProt PE5) with authentic MS evidence. By integrating our large-scale data sets of CCPD 2.0, the number of identified proteins has increased considerably beyond simulation saturation. Here, we show that special enrichment strategies can break through the data saturation bottleneck, which could increase the efficiency of MP identification in future C-HPP studies. All 7 data sets have been uploaded to ProteomeXchange with the identifier PXD002255.
- Published
- 2015
39. Proteogenomic Analysis to Identify Missing Proteins from Haploid Cell Lines
- Author
-
JongKeon Song, Min-Sik Kim, Keiryn L. Bennett, Giulio Superti-Furga, Korbinian Bösl, Seung Eun Lee, Akhilesh Pandey, André C. Müller, Dijana Vitko, and Richard Kumaran Kandasamy
- Subjects
0301 basic medicine ,RNA, Untranslated ,Proteome ,RNA-Seq ,Computational biology ,Biology ,Haploidy ,Proteomics ,Orbitrap ,Biochemistry ,law.invention ,Cell Line ,03 medical and health sciences ,law ,Tandem Mass Spectrometry ,Human proteome project ,Humans ,Amino Acid Sequence ,Molecular Biology ,Gene ,Proteogenomics ,NeXtProt ,Sequence Analysis, RNA ,030104 developmental biology ,Transcriptome - Abstract
Chromosome-centric Human Proteome Project aims at identifying and characterizing protein products encoded from all human protein-coding genes. As of early 2017, 19,837 protein-coding genes have been annotated in the neXtProt database including 2,691 missing proteins that have never been identified by mass spectrometry. Missing proteins may be low abundant in many cell types or expressed only in a few cell types in human body such as sperms in testis. In this study, we performed expression proteomics of two near haploid cell types such as HAP1 and KBM-7 to hunt for missing proteins. Proteomes from the two haploid cell lines were analyzed on an LTQ Orbitrap Velos, producing a total of 200 raw mass spectrometry files. After applying 1% false discovery rates at both levels of peptide-spectrum matches and proteins, more than ten thousand proteins were identified from HAP1 and KBM-7, resulting in the identification of nine missing proteins. Next, unmatched spectra were searched against protein databases translated in three frames from non-coding RNAs derived from RNA-Seq data, resulting in 6 novel protein-coding regions after careful manual inspection. This study demonstrates that expression proteomics coupled to proteogenomic analysis can be employed to identify many annotated and unannotated missing proteins. This is the pre-peer reviewed version of the following article: [Proteogenomic Analysis to Identify Missing Proteins from Haploid Cell Lines], which has been published in final form at [http://onlinelibrary.wiley.com/doi/10.1002/pmic.201700386/abstract]. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Self-Archiving.
- Published
- 2017
40. Progress on the HUPO Draft Human Proteome: 2017 Metrics of the Human Proteome Project
- Author
-
Christopher M. Overall, Emma Lundberg, Lydie Lane, Gilbert S. Omenn, and Eric W. Deutsch
- Subjects
0301 basic medicine ,Proteomics ,Proteome ,Human Protein Atlas ,Genomics ,Guidelines as Topic ,Computational biology ,Biology ,Bioinformatics ,Biochemistry ,Article ,Mass Spectrometry ,Transcriptome ,03 medical and health sciences ,Databases ,Human Genome Project ,Human proteome project ,Proteome/analysis ,Humans ,Databases, Protein ,Mass Spectrometry/methods ,ddc:616 ,NeXtProt ,Proteomics/methods/trends ,General Chemistry ,Protein/trends ,030104 developmental biology ,PeptideAtlas - Abstract
The Human Proteome Organization (HUPO) Human Proteome Project (HPP) continues to make progress on its two overall goals: (1) completing the protein parts list, with an annual update of the HUPO draft human proteome, and (2) making proteomics an integrated complement to genomics and transcriptomics throughout biomedical and life sciences research. neXtProt version 2017-01-23 has 17 008 confident protein identifications (Protein Existence [PE] level 1) that are compliant with the HPP Guidelines v2.1 ( https://hupo.org/Guidelines ), up from 13 664 in 2012-12 and 16 518 in 2016-04. Remaining to be found by mass spectrometry and other methods are 2579 "missing proteins" (PE2+3+4), down from 2949 in 2016. PeptideAtlas 2017-01 has 15 173 canonical proteins, accounting for nearly all of the 15 290 PE1 proteins based on MS data. These resources have extensive data on PTMs, single amino acid variants, and splice isoforms. The Human Protein Atlas v16 has 10 492 highly curated protein entries with tissue and subcellular spatial localization of proteins and transcript expression. Organ-specific popular protein lists have been generated for broad use in quantitative targeted proteomics using SRM-MS or DIA-SWATH-MS studies of biology and disease.
- Published
- 2017
41. Identification and Validation of Human Missing Proteins and Peptides in Public Proteome Databases: Data Mining Strategy
- Author
-
Suguru Saito, Ali F. Quadery, Amr Elguoshy, Toshiaki Mitsui, Tadashi Yamamoto, Bo Xu, Yoshitoshi Hirao, and Keiko Yamamoto
- Subjects
0301 basic medicine ,Proteomics ,Proteome ,Human Protein Atlas ,Biology ,computer.software_genre ,Biochemistry ,Transcriptome ,03 medical and health sciences ,medicine ,Human proteome project ,Data Mining ,Humans ,Tissue Distribution ,Databases, Protein ,Olfactory receptor ,Database ,NeXtProt ,Computational Biology ,General Chemistry ,030104 developmental biology ,medicine.anatomical_structure ,Identification (biology) ,Data mining ,computer - Abstract
In an attempt to complete human proteome project (HPP), Chromosome-Centric Human Proteome Project (C-HPP) launched the journey of missing protein (MP) investigation in 2012. However, 2579 and 572 protein entries in the neXtProt (2017-1) are still considered as missing and uncertain proteins, respectively. Thus, in this study, we proposed a pipeline to analyze, identify, and validate human missing and uncertain proteins in open-access transcriptomics and proteomics databases. Analysis of RNA expression pattern for missing proteins in Human protein Atlas showed that 28% of them, such as Olfactory receptor 1I1 ( O60431 ), had no RNA expression, suggesting the necessity to consider uncommon tissues for transcriptomic and proteomic studies. Interestingly, 21% had elevated expression level in a particular tissue (tissue-enriched proteins), indicating the importance of targeting such proteins in their elevated tissues. Additionally, the analysis of RNA expression level for missing proteins showed that 95% had no or low expression level (0-10 transcripts per million), indicating that low abundance is one of the major obstacles facing the detection of missing proteins. Moreover, missing proteins are predicted to generate fewer predicted unique tryptic peptides than the identified proteins. Searching for these predicted unique tryptic peptides that correspond to missing and uncertain proteins in the experimental peptide list of open-access MS-based databases (PA, GPM) resulted in the detection of 402 missing and 19 uncertain proteins with at least two unique peptides (≥9 aa) at(5 × 10
- Published
- 2017
42. Next Generation Proteomic Pipeline for Chromosome-Based Proteomic Research Using NeXtProt and GENCODE Databases
- Author
-
Heeyoun Hwang, Ju Yeon Lee, Hyoung Joo Lee, Young Mok Park, Jong Shin Yoo, Hyun Kyoung Lee, John R. Yates, Gun Wook Park, Ji Yeong Park, Jin Young Kim, Young Ki Paik, Sung Kyu Robin Park, Ji Eun Jeong, and Kyung Hoon Kwon
- Subjects
0301 basic medicine ,False discovery rate ,Male ,Proteomics ,Biology ,computer.software_genre ,Biochemistry ,Hippocampus ,03 medical and health sciences ,Chromosome (genetic algorithm) ,Testis ,Human proteome project ,Chromosomes, Human ,Humans ,Databases, Protein ,Genetics ,Brain Chemistry ,NeXtProt ,Database ,GENCODE ,Alternative splicing ,General Chemistry ,Proteogenomics ,Pipeline (software) ,Spermatozoa ,Alternative Splicing ,030104 developmental biology ,computer ,Protein Processing, Post-Translational - Abstract
Human Proteome Project aims to map all human proteins including missing proteins as well as proteoforms with post translational modifications, alternative splicing variants (ASVs), and single amino acid variants (SAAVs). neXtProt and Ensemble databases are usually used to provide curated information on human coding genes. However, to find these proteoforms, we (Chr #11 team) first introduce a streamlined pipeline using customized and concatenated neXtProt and GENCODE originated from Ensemble, with controlled false discovery rate (FDR). Because of large sized databases used in this pipeline, we found more stringent FDR filtering (0.1% at the peptide level and 1% at the protein level) to claim novel findings, such as GENCODE ASVs and missing proteins, from human hippocampus data set (MSV000081385) and ProteomeXchange (PXD007166). Using our next generation proteomic pipeline (nextPP) with neXtProt and GENCODE databases, two missing proteins such as activity-regulated cytoskeleton-associated protein (ARC, Chr 8) and glutamate receptor ionotropic, kainite 5 (GRIK5, Chr 19) were additionally identified with two or more unique peptides from human brain tissues. Additionally, by applying the pipeline to human brain related data sets such as cortex (PXD000067 and PXD000561), spinal cord, and fetal brain (PXD000561), seven GENCODE ASVs such as ACTN4-012 (Chr.19), DPYSL2-005 (Chr.8), MPRIP-003 (Chr.17), NCAM1-013 (Chr.11), EPB41L1-017 (Chr.20), AGAP1-004 (Chr.2), and CPNE5-005 (Chr.6) were identified from two or more data sets. The identified peptides of GENCODE ASVs were mapped onto novel exon insertions, alternative translations at 5'-untranslated region, or novel protein coding sequence. Applying the pipeline to male reproductive organ related data sets, 52 GENCODE ASVs were identified from two testis (PXD000561 and PXD002179) and a spermatozoa (PXD003947) data sets. Four out of 52 GENCODE ASVs such as RAB11FIP5-008 (Chr. 2), RP13-347D8.7-001 (Chr. X), PRDX4-002 (Chr. X), and RP11-666A8.13-001 (Chr. 17) were identified in all of the three samples.
- Published
- 2017
43. Multi-Protease Strategy Identifies Three PE2 Missing Proteins in Human Testis Tissue
- Author
-
Yao Zhang, Yang Chen, Tao Zhang, Yanchang Li, Yihao Wang, Wei Wei, Ping Xu, Yue Gao, and Fuchu He
- Subjects
0301 basic medicine ,Male ,Proteomics ,Proteases ,medicine.medical_treatment ,Biochemistry ,03 medical and health sciences ,Antigen ,Tandem Mass Spectrometry ,Testis ,Human proteome project ,medicine ,Humans ,Protease ,NeXtProt ,biology ,Proteins ,General Chemistry ,Trypsin ,Molecular biology ,030104 developmental biology ,Histone ,Proteome ,biology.protein ,Electrophoresis, Polyacrylamide Gel ,medicine.drug ,Chromatography, Liquid ,Peptide Hydrolases - Abstract
Although 5 years of the missing proteins (MPs) study have been completed, searching for MPs remains one of the core missions of the Chromosome-Centric Human Proteome Project (C-HPP). Following the next-50-MPs challenge of the C-HPP, we have focused on the testis-enriched MPs by various strategies since 2015. On the basis of the theoretical analysis of MPs (2017-01, neXtProt) using multiprotease digestion, we found that nonconventional proteases (e.g. LysargiNase, GluC) could improve the peptide diversity and sequence coverage compared with Trypsin. Therefore, a multiprotease strategy was used for searching more MPs in the same human testis tissues separated by 10% SDS-PAGE, followed by high resolution LC-MS/MS system (Q Exactive HF). A total of 7838 proteins were identified. Among them, three PE2 MPs in neXtProt 2017-01 have been identified: beta-defensin 123 ( Q8N688 , chr 20q), cancer/testis antigen family 45 member A10 ( P0DMU9 , chr Xq), and Histone H2A-Bbd type 2/3 ( P0C5Z0 , chr Xq). However, because only one unique peptide of ≥9 AA was identified in beta-defensin 123 and Histone H2A-Bbd type 2/3, respectively, further analysis indicates that each falls under the exceptions clause of the HPP Guidelines v2.1. After a spectrum quality check, isobaric PTM and single amino acid variant (SAAV) filtering, and verification with a synthesized peptide, and based on overlapping peptides from different proteases, these three MPs should be considered as exemplary examples of MPs found by exceptional criteria. Other MPs were considered as candidates but need further validation. All MS data sets have been deposited to the ProteomeXchange with identifier PXD006465.
- Published
- 2017
44. Systematic Proteogenomic Approach To Exploring a Novel Function for NHERF1 in Human Reproductive Disorder: Lessons for Exploring Missing Proteins
- Author
-
William S. Hancock, Chae Yeon Kim, Keun Na, Heon Shin, Gilbert S. Omenn, Eun-ah Kim, Jaeseung Lim, Jeong Min Shin, Hyung-Min Chung, Jin Young Cho, Jun Young Park, Hye Sun Kim, Seul Ki Jeong, Jong Sun Lim, Jihye Kim, Young Ki Paik, Sang Hee Jung, and Ah Reum Kang
- Subjects
0301 basic medicine ,Sodium-Hydrogen Exchangers ,Immunoblotting ,Computational biology ,Biology ,computer.software_genre ,Biochemistry ,Mass Spectrometry ,Article ,03 medical and health sciences ,Cell Movement ,Human proteome project ,Animals ,Humans ,Transgenes ,Caenorhabditis elegans ,Databases, Protein ,Human proteins ,Proteogenomics ,030102 biochemistry & molecular biology ,NeXtProt ,Reproduction ,Cell Differentiation ,General Chemistry ,Phosphoproteins ,Trophoblasts ,030104 developmental biology ,Strategic approach ,Proteome ,Genomic information ,Female ,Data mining ,computer ,Function (biology) - Abstract
One of the major goals of the Chromosome-Centric Human Proteome Project (C-HPP) is to fill the knowledge gaps between human genomic information and the corresponding proteomic information. These gaps are due to “missing” proteins (MPs)—predicted proteins with insufficient evidence from mass spectrometry (MS), biochemical, structural, or antibody analyses—that currently account for 2579 of the 19587 predicted human proteins (neXtProt, 2017–01). We address some of the lessons learned from the inconsistent annotations of missing proteins in databases (DB) and demonstrate a systematic proteogenomic approach designed to explore a potential new function of a known protein. To illustrate a cautious and strategic approach for characterization of novel function in vitro and in vivo, we present the case of Na(+)/H(+) exchange regulatory cofactor 1 (NHERF1/SLC9A3R1, located at chromosome 17q25.1; hereafter NHERF1), which was mistakenly labeled as an MP in one DB (Global Proteome Machine Database; GPMDB, 2011–09 release) but was well known in another public DB and in the literature. As a first step, NHERF1 was determined by MS and immunoblotting for its molecular identity. We next investigated the potential new function of NHERF1 by carrying out the quantitative MS profiling of placental trophoblasts (PXD004723) and functional study of cytotrophoblast JEG-3 cells. We found that NHERF1 was associated with trophoblast differentiation and motility. To validate this newly found cellular function of NHERF1, we used the Caenorhabditis elegans mutant of nrfl-1 (a nematode ortholog of NHERF1), which exhibits a protruding vulva (Pvl) and egg-laying-defective phenotype, and performed genetic complementation work. The nrfl-1 mutant was almost fully rescued by the transfection of the recombinant transgenic construct that contained human NHERF1. These results suggest that NHERF1 could have a previously unknown function in pregnancy and in the development of human embryos. Our study outlines a stepwise experimental platform to explore new functions of ambiguously denoted candidate proteins and scrutinizes the mandated DB search for the selection of MPs to study in the future.
- Published
- 2017
45. Decoding the Effect of Isobaric Substitutions on Identifying Missing Proteins and Variant Peptides in Human Proteome
- Author
-
Yu-Ju Chen, Tung-Shing Mamie Lih, Ting-Yi Sung, and Wai-Kok Choong
- Subjects
0301 basic medicine ,chemistry.chemical_classification ,NeXtProt ,Proteome ,Peptide ,General Chemistry ,Biology ,Biochemistry ,Amino acid ,03 medical and health sciences ,030104 developmental biology ,chemistry ,Tandem Mass Spectrometry ,Human proteome project ,Humans ,Protein Isoforms ,Splice isoforms ,Single amino acid ,Amino Acid Sequence ,PeptideAtlas ,Databases, Protein ,Peptides - Abstract
To confirm the existence of missing proteins, we need to identify at least two unique peptides with length of 9-40 amino acids of a missing protein in bottom-up mass-spectrometry-based proteomic experiments. However, an identified unique peptide of the missing protein, even identified with high level of confidence, could possibly coincide with a peptide of a commonly observed protein due to isobaric substitutions, mass modifications, alternative splice isoforms, or single amino acid variants (SAAVs). Besides unique peptides of missing proteins, identified variant peptides (SAAV-containing peptides) could also alternatively map to peptides of other proteins due to the aforementioned issues. Therefore, we conducted a thorough comparative analysis on data sets in PeptideAtlas Tiered Human Integrated Search Proteome (THISP, 2017-03 release), including neXtProt (2017-01 release), to systematically investigate the possibility of unique peptides in missing proteins (PE2-4), unique peptides in dubious proteins, and variant peptides affected by isobaric substitutions, causing doubtful identification results. In this study, we considered 11 isobaric substitutions. From our analysis, we found5% of the unique peptides of missing proteins and6% of variant peptides became shared with peptides of PE1 proteins after isobaric substitutions.
- Published
- 2017
46. Human Prestin: A Candidate PE1 Protein Lacking Stringent Mass Spectrometric Evidence?
- Author
-
Mark S. Baker, Shoba Ranganathan, Seong Beom Ahn, Varun K. A. Sreenivasan, and Abidali Mohamedali
- Subjects
0301 basic medicine ,Proteome ,Anion Transport Proteins ,Peptide ,Computational biology ,Bioinformatics ,Biochemistry ,Mass Spectrometry ,Transcriptome ,03 medical and health sciences ,0302 clinical medicine ,Human proteome project ,Humans ,Prestin ,chemistry.chemical_classification ,Chromosome 7 (human) ,biology ,NeXtProt ,General Chemistry ,Mass spectrometric ,030104 developmental biology ,chemistry ,Sulfate Transporters ,biology.protein ,Low copy number ,030217 neurology & neurosurgery - Abstract
The evidence that any protein exists in the Human Proteome Project (HPP; protein evidence 1 or PE1) has revolved primarily (although not exclusively) around mass spectrometry (MS) (93% of PE1 proteins have MS evidence in the latest neXtProt release), with robust and stringent, well-curated metrics that have served the community well. This has led to a significant number of proteins still considered "missing" (i.e., PE2-4). Many PE2-4 proteins have MS evidence of unacceptable quality (small or not enough unitypic peptides and unacceptably high protein/peptide FDRs), transcriptomic, or antibody evidence. Here we use a Chromosome 7 PE2 example called Prestin to demonstrate that clear and robust criteria/metrics need to be developed for proteins that may not or cannot produce clear-cut MS evidence while possessing significant non-MS evidence, including disease-association data. Many of the PE2-4 proteins are inaccessible, spatiotemporally expressed in a limited way, or expressed at such a very low copy number as to be unable to be detected by current MS methodologies. We propose that the HPP community consider and lead a communal initiative to accelerate the discovery and characterization of these types of "missing" proteins.
- Published
- 2017
47. Advances in the chromosome-centric human proteome project: looking to the future
- Author
-
Christopher M. Overall, Gilbert S. Omenn, Young Ki Paik, William S. Hancock, and Lydie Lane
- Subjects
0301 basic medicine ,Proteomics ,Proteome ,Computer science ,Human Protein Atlas ,Computational biology ,Biochemistry ,Article ,Chromosomes ,Mass Spectrometry ,03 medical and health sciences ,Databases ,Chromosome (genetic algorithm) ,Human proteome project ,Chromosomes, Human ,Humans ,Databases, Protein ,Molecular Biology ,Protein Processing ,ddc:616 ,Scope (project management) ,NeXtProt ,Protein ,Post-Translational ,Computational Biology ,Computational Biology/methods ,Proteome/genetics ,Identification (information) ,030104 developmental biology ,PeptideAtlas ,Proteomics/methods ,Protein Processing, Post-Translational ,Human - Abstract
The mission of the Chromosome-Centric Human Proteome Project (C-HPP), is to map and annotate the entire predicted human protein set (~20,000 proteins) encoded by each chromosome. The initial steps of the project are focused on 'missing proteins (MPs)', which lacked documented evidence for existence at protein level. In addition to remaining 2,579 MPs, we also target those annotated proteins having unknown functions, uPE1 proteins, alternative splice isoforms and post-translational modifications. We also consider how to investigate various protein functions involved in cis-regulatory phenomena, amplicons lncRNAs and smORFs. Areas covered: We will cover the scope, historic background, progress, challenges and future prospects of C-HPP. This review also addresses the question of how we can best improve the methodological approaches, select the optimal biological samples, and recommend stringent protocols for the identification and characterization of MPs. A new strategy for functional analysis of some of those annotated proteins having unknown function will also be discussed. Expert commentary: If the project moves well by reshaping the original goals, the current working modules and team work in the proposed extended planning period, it is anticipated that a progressively more detailed draft of an accurate chromosome-based proteome map will become available with functional information.
- Published
- 2017
48. Toward the Standardization of Mitochondrial Proteomics: The Italian Mitochondrial Human Proteome Project Initiative
- Author
-
Andrea Urbani, Marianna Caterino, Elisa Maffioli, Cristina Banfi, Tiziana Alberio, Barbara Garavaglia, Luisa Pieroni, Simona Fontana, Roberto Scatena, Giuseppe Petrosillo, Paola Roncada, Vito Porcelli, Patrizia Bottoni, Clara Musicco, Laura Giusti, Mara Zilocchi, Antonio Lucacchini, Gabriella Tedeschi, Fulvio Magni, Vincenzo Cunsolo, Valentina Monti, Flora Cozzolino, Italia Bongarzone, Alessio Soggiu, Viviana Greco, Rosaria Saletti, Francesca Monteleone, Clizia Chinello, Mauro Fasano, Maura Brioschi, Antonella Cormio, Maurizio Ronci, Maria Chiara Monti, Alberio, Tiziana, Pieroni, Luisa, Ronci, Maurizio, Banfi, Cristina, Bongarzone, Italia, Bottoni, Patrizia, Brioschi, Maura, Caterino, Marianna, Chinello, Clizia, Cormio, Antonella, Cozzolino, Flora, Cunsolo, Vincenzo, Fontana, Simona, Garavaglia, Barbara, Giusti, Laura, Greco, Viviana, Lucacchini, Antonio, Maffioli, Elisa, Magni, Fulvio, Monteleone, Francesca, Monti, Maria, Monti, Valentina, Musicco, Clara, Petrosillo, Giuseppe, Porcelli, Vito, Saletti, Rosaria, Scatena, Roberto, Soggiu, Alessio, Tedeschi, Gabriella, Zilocchi, Mara, Roncada, Paola, Urbani, Andrea, and Fasano, Mauro
- Subjects
Proteomics ,0301 basic medicine ,Proteome ,Standardization ,Computational biology ,Biology ,Mitochondrion ,Bioinformatics ,Biochemistry ,enrichment protocol ,mitochondria ,Mitochondrial Human Proteome Project ,standardization ,Cell Line ,Mitochondrial Proteins ,03 medical and health sciences ,0302 clinical medicine ,Tandem Mass Spectrometry ,Human proteome project ,Humans ,Protein Interaction Maps ,Settore BIO/10 - BIOCHIMICA ,Mitochondrial protein ,Chromatography ,Liquid ,NeXtProt ,Chemistry (all) ,General Chemistry ,030104 developmental biology ,Italy ,Reference database ,Chromatography, Liquid ,Mitochondria ,030217 neurology & neurosurgery - Abstract
The Mitochondrial Human Proteome Project aims at understanding the function of the mitochondrial proteome and its crosstalk with the proteome of other organelles. Being able to choose a suitable and validated enrichment protocol of functional mitochondria, based on the specific needs of the downstream proteomics analysis, would greatly help the researchers in the field. Mitochondrial fractions from ten model cell lines were prepared using three enrichment protocols and analyzed on seven different LC-MS/MS platforms. All data were processed using neXtProt as reference database. The data are available for the Human Proteome Project purposes through the ProteomeXchange Consortium with the identifier PXD007053. The processed data sets were analyzed using a suite of R routines to perform a statistical analysis and to retrieve subcellular and submitochondrial localizations. Although the overall number of identified total and mitochondrial proteins was not significantly dependent on the enrichment protocol, specific line to line differences were observed. Moreover, the protein lists were mapped to a network representing the functional mitochondrial proteome, encompassing mitochondrial proteins and their first interactors. More than 80% of the identified proteins resulted in nodes of this network but with a different ability in coisolating mitochondria-associated structures for each enrichment protocol/cell line pair.
- Published
- 2017
49. Proteogenomics: concepts, applications and computational strategies
- Author
-
Alexey I. Nesvizhskii
- Subjects
Proteomics ,Genetics ,Proteomics methods ,Proteome ,NeXtProt ,Extramural ,Genetic Variation ,High-Throughput Nucleotide Sequencing ,Genomics ,Cell Biology ,Computational biology ,Biology ,Proteogenomics ,Biochemistry ,Mass Spectrometry ,Article ,Sequence Analysis, Protein ,Protein Isoforms ,Cancer biology ,Databases, Nucleic Acid ,Databases, Protein ,Molecular Biology ,Biotechnology - Abstract
Proteogenomics is an area of research at the interface of proteomics and genomics. In this approach, customized protein sequence databases generated using genomic and transcriptomic information are used to help identify novel peptides (not present in reference protein sequence databases) from mass spectrometry-based proteomic data; in turn, the proteomic data can be used to provide protein-level evidence of gene expression and to help refine gene models. In recent years, owing to the emergence of next generation sequencing technologies such as RNA-Seq and dramatic improvements in the depths and throughput of mass spectrometry-based proteomics, the pace of proteogenomics research has greatly accelerated. Here I review the current state of proteogenomics methods and applications, including computational strategies for building and using customized protein sequence databases. I also draw attention to the challenge of false positives in proteogenomics, and provide guidelines for analyzing the data and reporting the results of proteogenomics studies.
- Published
- 2014
50. Chromosome-centric Human Proteome Project: Deciphering Proteins Associated with Glioma and Neurodegenerative Disorders on Chromosome 12
- Author
-
Anil K. Madugundu, Jayshree Advani, Ravi Sirdeshmukh, Akhilesh Pandey, Manoj Kumar Gupta, Sandip Chavan, Savita Jayaram, and Visith Thongboonkerd
- Subjects
Genetics ,Chromosomes, Human, Pair 12 ,Proteome ,NeXtProt ,Human Protein Atlas ,Chromosome Mapping ,Molecular Sequence Annotation ,Parkinson Disease ,General Chemistry ,Biology ,Biochemistry ,Peptide Fragments ,Open Reading Frames ,Alzheimer Disease ,Tandem Mass Spectrometry ,Multigene Family ,Human proteome project ,Humans ,Ensembl ,Amino Acid Sequence ,PeptideAtlas ,Glioblastoma ,Gene ,Chromosome 12 - Abstract
In line with the aims of the Chromosome-centric Human Proteome Project (C-HPP) to completely annotate proteins of each chromosome and biology/disease driven HPP (B/D-HPP) to decipher their relation to diseases, we have generated a nonredundant catalogue of protein-coding genes for Chromosome 12 (Chr. 12) and further annotated proteins associated with major neurological disorders. Integrating high level proteomic evidence from four major databases (neXtProt, Global Proteome Machine (GPMdb), PeptideAtlas, and Human Protein Atlas (HPA)) along with Ensembl data resource resulted in the identification of 1066 protein coding genes, of which 171 were defined as "missing proteins" based on the weak or complete absence of experimental evidence. With functional annotations using DAVID and GAD, about 40% of the proteins could be grouped as brain related with implications in cancer or neurological disorders. We used published and unpublished high confidence mass spectrometry data from our group and other literature consisting of more than 5000 proteins derived from clinical specimens from patients with human gliomas, Alzheimer's disease, and Parkinson's disease and mapped it onto Chr. 12. We observed a total of 202 proteins mapping to human Chr. 12, 136 of which were differentially expressed in these disease conditions as compared to the normal. Functional grouping indicated their association with cell cycle, cell-to-cell signaling, and other important processes and networks, whereas their disease association analysis confirmed neurological diseases and cancer as the major group along with psycological disorders, with several overexpressed genes/proteins mapping to 12q13-15 amplicon region. Using multiple strategies and bioinformatics tools, we identified 103 differentially expressed proteins to have secretory potential, 17 of which have already been reported in direct analysis of the plasma or cerebrospinal fluid (CSF) from the patients and 21 of them mapped to cancer associated protein (CAPs) database that are amenable to selective reaction monitoring (SRM) assays for targeted proteomic analysis. Our analysis also reveals, for the first time, mass spectrometric evidence for two "missing proteins" from Chr. 12, namely, synaptic vesicle 2-related protein (SVOP) and IQ motif containing D (IQCD). The analysis provides a snapshot of Chr. 12 encoded proteins associated with gliomas and major neurological conditions and their secretability which can be used to drive efforts for clinical applications.
- Published
- 2014
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.