1. Mass spectrometry-based identification and characterization of human hypothetical proteins highlighting the inconsistency across the protein databases
- Author
-
Johny Ijaq, Medicharla V. Jagannadham, and Neeraja Bethi
- Subjects
Gel electrophoresis ,0303 health sciences ,Hypothetical protein ,Genome browser ,Computational biology ,Biology ,Proteomics ,03 medical and health sciences ,0302 clinical medicine ,Human proteome project ,Ensembl ,UniProt ,030217 neurology & neurosurgery ,030304 developmental biology ,Reference genome - Abstract
A myriad of predicted proteins have been described at the genome scale, but their existence has not been confirmed at the protein level. These proteins that are predicted to be expressed from an open-reading frame (ORF) but for which translation has not been demonstrated are known as hypothetical proteins and constitute major fraction of the human proteome. In this study, we aim to identify and characterize hypothetical proteins from human tumor cell lines, viz., HeLa, MCF7, and BT474, thus providing the analytical basis for their expression. We used gel electrophoresis followed by in-gel digestion of the selected protein lanes and subsequent LC–MS/MS analysis of protein tryptic digests. ENSEMBL genome browser was used for genomic alignment. On search against human hypothetical protein data from NCBI database, 110 common proteins were identified across the three selected cells lines. Out of these, 88 proteins were already functionally characterized and remaining 22 were still found to be unreviewed in UniProt, lacking the evidence of expression at the protein level. To explore them further, following HPP guidelines, 15 proteins were selected and aligned against human reference genome. Five hypothetical proteins were confirmed as isoforms of known proteins. We conclude that the proteomic approach used would serve as a suitable tool to validate the existence of predicted or hypothetical proteins at the protein level. The MS proteomics data have been deposited to the ProteomeXchange Consortium via PRIDE with the data set identifiers PXD014258.
- Published
- 2020