1. Uncovering Thousands of New Peptides with Sequence-Mask-Search Hybrid De Novo Peptide Sequencing Framework
- Author
-
Korrawe Karunratanakul, David W. Speicher, Ekapol Chuangsuwanich, Sira Sriswasdi, and Hsin-Yao Tang
- Subjects
chemistry.chemical_classification ,0303 health sciences ,030302 biochemistry & molecular biology ,Peptide ,De novo peptide sequencing ,Human leukocyte antigen ,Computational biology ,Biology ,Proteomics ,Biochemistry ,Analytical Chemistry ,Amino acid ,03 medical and health sciences ,chemistry ,Proteome ,Identification (biology) ,Molecular Biology ,030304 developmental biology ,Sequence (medicine) - Abstract
Typical analyses of mass spectrometry data only identify amino acid sequences that exist in reference databases. This restricts the possibility of discovering new peptides such as those that contain uncharacterized mutations or originate from unexpected processing of RNAs and proteins. De novo peptide sequencing approaches address this limitation but often suffer from low accuracy and require extensive validation by experts. Here, we develop SMSNet, a deep learning-based de novo peptide sequencing framework that achieves >95% amino acid accuracy while retaining good identification coverage. Applications of SMSNet on landmark proteomics and peptidomics studies reveal over 10,000 previously uncharacterized HLA antigens and phosphopeptides, and in conjunction with database-search methods, expand the coverage of peptide identification by almost 30%. The power to accurately identify new peptides of SMSNet would make it an invaluable tool for any future proteomics and peptidomics studies, including tumor neoantigen discovery, antibody sequencing, and proteome characterization of non-model organisms.
- Published
- 2019
- Full Text
- View/download PDF