114 results on '"Alexey I. Nesvizhskii"'
Search Results
2. Implementing the MSFragger Search Engine as a Node in Proteome Discoverer
- Author
-
Hui-Yin Chang, Sarah E. Haynes, Fengchao Yu, and Alexey I. Nesvizhskii
- Subjects
General Chemistry ,Biochemistry - Abstract
Here, we describe the implementation of the fast proteomics search engine MSFragger as a processing node in the widely used Proteome Discoverer (PD) software platform. PeptideProphet (via the Philosopher tool kit) is also implemented as an additional PD node to allow validation of MSFragger open (mass-tolerant) search results. These two nodes, along with the existing Percolator validation module, allow users to employ different search strategies and conveniently inspect search results through PD. Our results have demonstrated the improved numbers of PSMs, peptides, and proteins identified by MSFragger coupled with Percolator and significantly faster search speed compared to the conventional SEQUEST/Percolator PD workflows. The MSFragger-PD node is available at https://github.com/nesvilab/PD-Nodes/releases/.
- Published
- 2022
3. Comparative Evaluation of Proteome Discoverer and FragPipe for the TMT-Based Proteome Quantification
- Author
-
Tianen He, Youqi Liu, Yan Zhou, Lu Li, He Wang, Shanjun Chen, Jinlong Gao, Wenhao Jiang, Yi Yu, Weigang Ge, Hui-Yin Chang, Ziquan Fan, Alexey I. Nesvizhskii, Tiannan Guo, and Yaoting Sun
- Subjects
Proteomics ,Proteome ,Tandem Mass Spectrometry ,General Chemistry ,Biochemistry ,Software - Abstract
Isobaric labeling-based proteomics is widely applied in deep proteome quantification. Among the platforms for isobaric labeled proteomic data analysis, the commercial software Proteome Discoverer (PD) is widely used, incorporating the search engine CHIMERYS, while FragPipe (FP) is relatively new, free for noncommercial purposes, and integrates the engine MSFragger. Here, we compared PD and FP over three public proteomic data sets labeled using 6plex, 10plex, and 16plex tandem mass tags. Our results showed the protein abundances generated by the two software are highly correlated. PD quantified more proteins (10.02%, 15.44%, 8.19%) than FP with comparable NA ratios (0.00% vs. 0.00%, 0.85% vs. 0.38%, and 11.74% vs. 10.52%) in the three data sets. Using the 16plex data set, PD and FP outputs showed high consistency in quantifying technical replicates, batch effects, and functional enrichment in differentially expressed proteins. However, FP saved 93.93%, 96.65%, and 96.41% of processing time compared to PD for analyzing the three data sets, respectively. In conclusion, while PD is a well-maintained commercial software integrating various additional functions and can quantify more proteins, FP is freely available and achieves similar output with a shorter computational time. Our results will guide users in choosing the most suitable quantification software for their needs.
- Published
- 2022
4. Recent advances in computational algorithms and software for large-scale glycoproteomics
- Author
-
Daniel A. Polasky and Alexey I. Nesvizhskii
- Subjects
Biochemistry ,Analytical Chemistry - Abstract
Glycoproteomics, or characterizing glycosylation events at a proteome scale, has seen rapid advances in methods for analyzing glycopeptides by tandem mass spectrometry in recent years. These advances have enabled acquisition of far more comprehensive and large-scale datasets, precipitating an urgent need for improved informatics methods to analyze the resulting data. A new generation of glycoproteomics search methods has recently emerged, using glycan fragmentation to split the identification of a glycopeptide into peptide and glycan components and solve each component separately. In this review, we discuss these new methods and their implications for large-scale glycoproteomics, as well as several outstanding challenges in glycoproteomics data analysis, including validation of glycan assignments and quantitation. Finally, we provide an outlook on the future of glycoproteomics from an informatics perspective, noting the key challenges to achieving widespread and reproducible glycopeptide annotation and quantitation.
- Published
- 2022
5. Identification of secreted proteins by comparison of protein abundance in conditioned media and cell lysates
- Author
-
Prabhodh S. Abbineni, Vi T. Tang, Felipe da Veiga Leprevost, Venkatesha Basrur, Jie Xiang, Alexey I. Nesvizhskii, and David Ginsburg
- Subjects
Brefeldin A ,Culture Media, Conditioned ,Biophysics ,Golgi Apparatus ,Proteins ,Cell Biology ,Molecular Biology ,Biochemistry - Abstract
Analysis of the full spectrum of secreted proteins in cell culture is complicated by leakage of intracellular proteins from damaged cells. To address this issue, we compared the abundance of individual proteins between the cell lysate and the conditioned medium, reasoning that secreted proteins should be relatively more abundant in the conditioned medium. Marked enrichment for signal-peptide-bearing proteins with increasing conditioned media to cell lysate ratio, as well loss of this signal following brefeldin A treatment, confirmed the sensitivity and specificity of this approach. The subset of proteins demonstrating increased conditioned media to cell lysate ratio in the presence of Brefeldin A identified candidates for unconventional secretion via a pathway independent of ER to Golgi trafficking.
- Published
- 2022
6. MSFragger-Labile: A Flexible Method to Improve Labile PTM Analysis in Proteomics
- Author
-
Daniel A. Polasky, Daniel J. Geiszler, Fengchao Yu, Kai Li, Guo Ci Teo, and Alexey I. Nesvizhskii
- Subjects
Molecular Biology ,Biochemistry ,Analytical Chemistry - Abstract
Post-translational modifications of proteins play essential roles in defining and regulating the functions of the proteins they decorate, making identification of these modifications critical to understanding biology and disease. Methods for enriching and analyzing a wide variety of biological and chemical modifications of proteins have been developed using mass spectrometry (MS)-based proteomics, largely relying on traditional database search methods to annotate resulting mass spectra of modified peptides. These database search methods treat modifications as static attachments of a mass to particular position in the peptide sequence, but many modifications undergo fragmentation in tandem MS experiments alongside, or instead of, the peptide backbone. While this fragmentation can confound traditional search methods, it also offers unique opportunities for improved searches that incorporate modification-specific fragment ions. Here, we present a new Labile Mode in the MSFragger search engine that can tailor modification-centric searches to the fragmentation observed. We show that labile mode can dramatically improve spectrum annotation rates of phosphopeptides, RNA-crosslinked peptides, and ADP-ribosylated peptides. Each of these modifications presents distinct fragmentation characteristics, showcasing the flexibility of MSFragger labile mode to improve search for a wide variety of biological and chemical modifications.
- Published
- 2023
7. Fast Deisotoping Algorithm and Its Implementation in the MSFragger Search Engine
- Author
-
Fengchao Yu, Daniel A. Polasky, Alexey I. Nesvizhskii, and Guo Ci Teo
- Subjects
Proteomics ,Computer science ,Perspective (graphical) ,Process (computing) ,General Chemistry ,Biochemistry ,Article ,Mass Spectrometry ,Search Engine ,Range (mathematics) ,Search engine ,Preprocessor ,Database search engine ,Databases, Protein ,Algorithm ,Algorithms ,Software - Abstract
Deisotoping, or the process of removing peaks in a mass spectrum resulting from the incorporation of naturally occurring heavy isotopes, has long been used to reduce complexity and improve the effectiveness of spectral annotation methods in proteomics. We have previously described MSFragger, an ultrafast search engine for proteomics, that did not utilize deisotoping in processing input spectra. Here, we present a new, high-speed parallelized deisotoping algorithm, based on elements of several existing methods, that we have incorporated into the MSFragger search engine. Applying deisotoping with MSFragger reveals substantial improvements to database search speed and performance, particularly for complex methods like open or nonspecific searches. Finally, we evaluate our deisotoping method on data from several instrument types and vendors, revealing a wide range in performance and offering an updated perspective on deisotoping in the modern proteomics environment.
- Published
- 2020
8. Fast and Comprehensive N- and O-glycoproteomics analysis with MSFragger-Glyco
- Author
-
Alexey I. Nesvizhskii, Daniel A. Polasky, Fengchao Yu, and Guo Ci Teo
- Subjects
chemistry.chemical_classification ,0303 health sciences ,Glycan ,Proteomics methods ,Glycosylation ,biology ,Extramural ,Computer science ,Cell Biology ,Computational biology ,Biochemistry ,Mass spectrometric ,Glycopeptide ,Article ,Glycoproteomics ,03 medical and health sciences ,chemistry.chemical_compound ,chemistry ,biology.protein ,Phosphorylation ,Glycoprotein ,Molecular Biology ,030304 developmental biology ,Biotechnology - Abstract
Glycosylation is a ubiquitous and heterogeneous post-translational modification (PTM) used to accomplish a wide variety of critical cellular tasks. Recent advances in methods for enrichment and mass spectrometric analysis of intact glycopeptides have produced large-scale, high-quality glycoproteomics datasets, but interpreting this data remains challenging. In addition to being large, complex, and heterogeneous, glycans undergo fragmentation during vibrational activation, making common PTM search strategies ineffective for their identification. We present a computational tool called MSFragger-Glyco for fast and highly sensitive identification of N- and O-linked glycopeptides using open and glycan mass offset search strategies. Reanalysis of recently published N-glycoproteomics data resulted in annotation of 83% more glycopeptide-spectrum matches (glycoPSMs) than in previous results, which translated to substantial increases in the numbers of glycoproteins and glycosites that could be identified. In published O-glycoproteomics data, our method more than doubled the number of glycoPSMs annotated when searching the same peptides as the original search and resulted in up to a 6-fold increase when expanding searches to include large numbers of possible glycan compositions and other modifications. Expanded searches revealed trends in glycan composition and crosstalk with phosphorylation that remained hidden to the original search. With greatly improved spectral annotation, coupled with the fast speed of fragment ion index-based scoring, MSFragger-Glyco makes it possible to comprehensively interrogate glycoproteomics data and illuminate the many roles of glycosylation.
- Published
- 2020
9. Kir2.1 Interactome Mapping Uncovers PKP4 as a Modulator of the Kir2.1-Regulated Inward Rectifier Potassium Currents
- Author
-
Venkatesha Basrur, Rork Kuick, José Jalife, Guadalupe Guerrero-Serna, Justin Yoon, Alexey I. Nesvizhskii, Jean François Rual, Sung Soo Park, Daniela Ponce-Balbuena, Dattatreya Mellacheruvu, Kevin P. Conlon, National Institutes of Health (Estados Unidos), NIH - National Heart, Lung, and Blood Institute (NHLBI) (Estados Unidos), NIH - National Institute of General Medical Sciences (NIGMS) (Estados Unidos), NIH - National Cancer Institute (NCI) (Estados Unidos), University of Michigan Rogel Cancer Center (Estados Unidos), National Institutes of Health (United States), National Heart, Lung, and Blood Institute (United States), National Institute of General Medical Sciences (United States), National Cancer Institute (United States), and University of Michigan Rogel Cancer Center (United States)
- Subjects
Patch-Clamp Techniques ,Utrophin ,macromolecular complex analysis ,Kir2.1 ,cardiovascular function or biology ,Regulator ,Action Potentials ,Computational biology ,Biology ,Biochemistry ,Interactome ,Analytical Chemistry ,Protein–protein interaction ,03 medical and health sciences ,Somatomedins ,Tandem Mass Spectrometry ,cardiovascular disease ,inward rectifier potassium current ,Humans ,Myocytes, Cardiac ,Protein Interaction Maps ,Potassium Channels, Inwardly Rectifying ,BioID ,Molecular Biology ,mass spectrometry ,PKP4 ,030304 developmental biology ,Andersen Syndrome ,Membrane potential ,0303 health sciences ,Inward-rectifier potassium ion channel ,Research ,030302 biochemistry & molecular biology ,Desmosomes ,Protein-protein interactions ,Protein Transport ,HEK293 Cells ,Mutation ,Potassium ,cardiovascular system ,Signal transduction ,Lysosomes ,Plakophilins ,cardiomyopathy ,Function (biology) ,Chromatography, Liquid ,Molecular Chaperones ,Signal Transduction - Abstract
A comprehensive map of the Kir2.1 interactome was generated using the proximity-labeling approach BioID. The map encompasses 218 interactions, the vast majority of which are novel, and explores the variations in the interactome profiles of Kir2.1WT versus Kir2.1Δ314-315, a trafficking deficient ATS1 mutant, thus uncovering molecular mechanisms whose malfunctions may underlie ATS1 disease. PKP4, one of the BioID interactors, is validated as a modulator of Kir2.1-controlled inward rectifier potassium currents., Graphical Abstract Highlights • Generation using BioID of a map of the Kir2.1 interactome with 218 interactions. • Identification of Kir2.1WT- versus Kir2.1Δ314-315-preferred interactors. • Identification of the desmosome protein PKP4 as a new modulator of IKir2.1 currents., Kir2.1, a strong inward rectifier potassium channel encoded by the KCNJ2 gene, is a key regulator of the resting membrane potential of the cardiomyocyte and plays an important role in controlling ventricular excitation and action potential duration in the human heart. Mutations in KCNJ2 result in inheritable cardiac diseases in humans, e.g. the type-1 Andersen-Tawil syndrome (ATS1). Understanding the molecular mechanisms that govern the regulation of inward rectifier potassium currents by Kir2.1 in both normal and disease contexts should help uncover novel targets for therapeutic intervention in ATS1 and other Kir2.1-associated channelopathies. The information available to date on protein-protein interactions involving Kir2.1 channels remains limited. Additional efforts are necessary to provide a comprehensive map of the Kir2.1 interactome. Here we describe the generation of a comprehensive map of the Kir2.1 interactome using the proximity-labeling approach BioID. Most of the 218 high-confidence Kir2.1 channel interactions we identified are novel and encompass various molecular mechanisms of Kir2.1 function, ranging from intracellular trafficking to cross-talk with the insulin-like growth factor receptor signaling pathway, as well as lysosomal degradation. Our map also explores the variations in the interactome profiles of Kir2.1WTversus Kir2.1Δ314-315, a trafficking deficient ATS1 mutant, thus uncovering molecular mechanisms whose malfunctions may underlie ATS1 disease. Finally, using patch-clamp analysis, we validate the functional relevance of PKP4, one of our top BioID interactors, to the modulation of Kir2.1-controlled inward rectifier potassium currents. Our results validate the power of our BioID approach in identifying functionally relevant Kir2.1 interactors and underline the value of our Kir2.1 interactome as a repository for numerous novel biological hypotheses on Kir2.1 and Kir2.1-associated diseases.
- Published
- 2020
10. Genotype‐phenotype analysis ofLMNA‐related diseases predicts phenotype‐selective alterations in lamin phosphorylation
- Author
-
Eric Lin, Raymond Kwan, M. Bishr Omary, Alexey I. Nesvizhskii, and Graham F. Brady
- Subjects
Male ,0301 basic medicine ,congenital, hereditary, and neonatal diseases and abnormalities ,Genotype ,Context (language use) ,Laminopathy ,Biology ,Biochemistry ,Article ,LMNA ,03 medical and health sciences ,0302 clinical medicine ,Genetics ,medicine ,Humans ,Phosphorylation ,Molecular Biology ,Genetic Association Studies ,Progeria ,integumentary system ,Laminopathies ,Lamin Type A ,medicine.disease ,Phenotype ,Cell biology ,030104 developmental biology ,Mutation ,embryonic structures ,Nuclear lamina ,Female ,030217 neurology & neurosurgery ,Lamin ,Biotechnology - Abstract
Laminopathies are rare diseases associated with mutations in LMNA, which encodes nuclear lamin A/C. LMNA variants lead to diverse tissue-specific phenotypes including cardiomyopathy, lipodystrophy, myopathy, neuropathy, progeria, bone/skin disorders, and overlap syndromes. The mechanisms underlying these heterogeneous phenotypes remain poorly understood, although post-translational modifications, including phosphorylation, are postulated as regulators of lamin function. We catalogued all known lamin A/C human mutations and their associated phenotypes, and systematically examined the putative role of phosphorylation in laminopathies. In silico prediction of specific LMNA mutant-driven changes to lamin A phosphorylation and protein structure was performed using machine learning methods. Some of the predictions we generated were validated via assessment of ectopically expressed wild-type and mutant LMNA. Our findings indicate phenotype- and mutant-specific alterations in lamin phosphorylation, and that some changes in phosphorylation may occur independently of predicted changes in lamin protein structure. Therefore, therapeutic targeting of phosphorylation in the context of laminopathies will likely require mutant- and kinase-specific approaches.
- Published
- 2020
11. Crystal-C: A Computational Tool for Refinement of Open Search Results
- Author
-
Dmitry M. Avtonomov, Sarah E. Haynes, Alexey I. Nesvizhskii, Andy T. Kong, Felipe da Veiga Leprevost, and Hui Yin Chang
- Subjects
Proteomics ,0301 basic medicine ,030102 biochemistry & molecular biology ,business.industry ,Computer science ,Pattern recognition ,General Chemistry ,Mass spectrometry ,Biochemistry ,Article ,Characterization (materials science) ,Crystal (programming language) ,Data set ,03 medical and health sciences ,030104 developmental biology ,Tandem Mass Spectrometry ,Liquid chromatography–mass spectrometry ,Histogram ,Database search engine ,Artificial intelligence ,Databases, Protein ,Peptides ,business ,Shotgun proteomics ,Protein Processing, Post-Translational - Abstract
Shotgun proteomics using liquid chromatography coupled to mass spectrometry (LC-MS) is commonly used to identify peptides containing post-translational modifications. With the emergence of fast database search tools such as MSFragger, the approach of enlarging precursor mass tolerances during the search (termed "open search") has been increasingly used for comprehensive characterization of post-translational and chemical modifications of protein samples. However, not all mass shifts detected using the open search strategy represent true modifications, as artifacts exist from sources such as unaccounted missed cleavages or peptide co-fragmentation (chimeric MS/MS spectra). Here, we present Crystal-C, a computational tool that detects and removes such artifacts from open search results. Our analysis using Crystal-C shows that, in a typical shotgun proteomics data set, the number of such observations is relatively small. Nevertheless, removing these artifacts helps to simplify the interpretation of the mass shift histograms, which in turn should improve the ability of open search-based tools to detect potentially interesting mass shifts for follow-up investigation.
- Published
- 2020
12. GRASP55 regulates the unconventional secretion and aggregation of mutant huntingtin
- Author
-
Erpan Ahat, Sarah Bui, Jianchao Zhang, Felipe da Veiga Leprevost, Lisa Sharkey, Whitney Reid, Alexey I. Nesvizhskii, Henry L. Paulson, and Yanzhuang Wang
- Subjects
Huntingtin Protein ,Mice ,Autophagosomes ,Animals ,Golgi Apparatus ,Golgi Matrix Proteins ,Humans ,Neurodegenerative Diseases ,Cell Biology ,Lysosomes ,Molecular Biology ,Biochemistry - Abstract
Recent studies demonstrated that the Golgi reassembly stacking proteins (GRASPs), especially GRASP55, regulate Golgi-independent unconventional secretion of certain cytosolic and transmembrane cargoes; however, the underlying mechanism remains unknown. Here, we surveyed several neurodegenerative disease-related proteins, including mutant huntingtin (Htt-Q74), superoxide dismutase 1 (SOD1), tau, and TAR DNA-binding protein 43 (TDP-43), for unconventional secretion; our results show that Htt-Q74 is most robustly secreted in a GRASP55-dependent manner. Using Htt-Q74 as a model system, we demonstrate that unconventional secretion of Htt is GRASP55 and autophagy dependent and is enhanced under stress conditions such as starvation and endoplasmic reticulum stress. Mechanistically, we show that GRASP55 facilitates Htt secretion by tethering autophagosomes to lysosomes to promote autophagosome maturation and subsequent lysosome secretion and by stabilizing p23/TMED10, a channel for translocation of cytoplasmic proteins into the lumen of the endoplasmic reticulum-Golgi intermediate compartment. Moreover, we found that GRASP55 levels are upregulated by various stresses to facilitate unconventional secretion, whereas inhibition of Htt-Q74 secretion by GRASP55 KO enhances Htt aggregation and toxicity. Finally, comprehensive secretomic analysis identified novel cytosolic cargoes secreted by the same unconventional pathway, including transgelin (TAGLN), multifunctional protein ADE2 (PAICS), and peroxiredoxin-1 (PRDX1). In conclusion, this study defines the pathway of GRASP55-mediated unconventional protein secretion and provides important insights into the progression of Huntington's disease.
- Published
- 2021
13. SP3-Enabled Rapid and High Coverage Chemoproteomic Identification of Cell-State-Dependent Redox-Sensitive Cysteines
- Author
-
Heta S. Desai, Tianyang Yan, Fengchao Yu, Alexander W. Sun, Miranda Villanueva, Alexey I. Nesvizhskii, and Keriann M. Backus
- Subjects
Proteomics ,Biochemistry & Molecular Biology ,Proteome ,Underpinning research ,1.1 Normal biological development and functioning ,Generic health relevance ,Cysteine ,Molecular Biology ,Biochemistry ,Oxidation-Reduction ,Mass Spectrometry ,Analytical Chemistry ,Biotechnology - Abstract
Proteinaceous cysteine residues act as privileged sensors of oxidative stress. As reactive oxygen and nitrogen species have been implicated in numerous pathophysiological processes, deciphering which cysteines are sensitive to oxidative modification and the specific nature of these modifications is essential to understanding protein and cellular function in health and disease. While established mass spectrometry-based proteomic platforms have improved our understanding of the redox proteome, the widespread adoption of these methods is often hindered by complex sample preparation workflows, prohibitive cost of isotopic labeling reagents, and requirements for custom data analysis workflows. Here, we present the SP3-Rox redox proteomics method that combines tailored low cost isotopically labeled capture reagents with SP3 sample cleanup to achieve high throughput and high coverage proteome-wide identification of redox-sensitive cysteines. By implementing a customized workflow in the free FragPipe computational pipeline, we achieve accurate MS1-based quantitation, including for peptides containing multiple cysteine residues. Application of the SP3-Rox method to cellular proteomes identified cysteines sensitive to the oxidative stressor GSNO and cysteine oxidation state changes that occur during T cell activation.
- Published
- 2021
14. DeltaMass: Automated Detection and Visualization of Mass Shifts in Proteomic Open-Search Results
- Author
-
Andy T. Kong, Alexey I. Nesvizhskii, and Dmitry M. Avtonomov
- Subjects
Proteomics ,0301 basic medicine ,Computer science ,Biochemistry ,Article ,Automation ,03 medical and health sciences ,Software ,Data visualization ,Database search engine ,Databases, Protein ,Graphical user interface ,Data processing ,030102 biochemistry & molecular biology ,business.industry ,Window (computing) ,Pattern recognition ,General Chemistry ,Visualization ,Molecular Weight ,Identification (information) ,030104 developmental biology ,Artificial intelligence ,business ,Protein Processing, Post-Translational - Abstract
Routine identification of thousands of proteins in a single LC-MS experiment has long become the norm. With these vast amounts of data, more rigorous treatment of modified forms of peptides becomes possible. "Open search", a protein database search with a large precursor ion mass tolerance window, is becoming a popular method to evaluate possible sets of post-translational and chemical modifications in samples. The extraction of statistical information about the modification from peptide search results requires additional effort and data processing, such as recalibration of masses and accurate detection of precursors in MS1 signals. Here we present a software tool, DeltaMass, which performs kernel-density-based estimation of observed mass shifts and allows for the detection of poorly resolved mass deltas. The software also maps observed mass shifts to known modifications from public databases such as UniMod and augments them with additionally generated possible chemical changes to the molecule. Its interactive graphical interface provides an effective option for the visual interrogation of the data and the identification of potentially interesting mass shifts or unusual artifacts for subsequent analysis. However, the program can also be used in fully automated command-line mode to generate mass-shift peak lists as well.
- Published
- 2018
15. MassIVE.quant: a community resource of quantitative mass spectrometry–based proteomics datasets
- Author
-
Ting Huang, Erik Verschueren, Yasset Perez-Riverol, Bernd Wollscheid, Ruth Hüttenhain, Jeremy Carver, Eduard Sabidó, Olga Vitek, Tom Dunkley, Guo Ci Teo, Oliver M. Bernhardt, Benjamin Pullman, Nuno Bandeira, Maria P. Pavlou, Lukas Reiter, Meena Choi, Cristina Chiva, Jan Muntel, Manuel Tzouros, Tsung-Heng Tsai, Alexey I. Nesvizhskii, Maik Müller, and Sandra Goetze
- Subjects
Proteomics ,Computer science ,Saccharomyces cerevisiae ,Information repository ,computer.software_genre ,Mass spectrometry ,Data publication and archiving ,Biochemistry ,Article ,Mass Spectrometry ,Fungal Proteins ,03 medical and health sciences ,Data acquisition ,Databases, Protein ,Molecular Biology ,030304 developmental biology ,0303 health sciences ,Reproducibility of Results ,Experimental data ,Cell Biology ,Metadata ,Workflow ,ComputingMethodologies_PATTERNRECOGNITION ,Quantitative analysis (finance) ,Data mining ,computer ,Algorithms ,Software ,Biotechnology - Abstract
MassIVE.quant is a repository infrastructure and data resource for reproducible quantitative mass spectrometry-based proteomics, which is compatible with all mass spectrometry data acquisition types and computational analysis tools. A branch structure enables MassIVE.quant to systematically store raw experimental data, metadata of the experimental design, scripts of the quantitative analysis workflow, intermediate input and output files, as well as alternative reanalyses of the same dataset. This work was supported in part by NSF CAREER award no. DBI-1054826, grant no. NSF DBI-1759736 and the Chan-Zuckerberg foundation to O.V., grant no. NIH-NLM 1R01LM013115 to N.B. and O.V., NSF award no. ABI 1759980, NIH award nos. P41GM103484 and R24GM127667 to N.B. and the Personalized Health and Related Technologies (grant no. PHRT 0-21411-18) strategic focus area of ETH to B.W. The CRG/UPF Proteomics Unit is part of the Spanish Infrastructure for Omics Technologies (ICTS OmicsTech) and it is a member of the ProteoRed PRB3 consortium that is supported by grant no. PT17/0019 of the PE I+D+i 2013–2016 from the Instituto de Salud Carlos III (ISCIII) and ERDF. We acknowledge support from the Spanish Ministry of Science, Innovation and Universities, ‘Centro de Excelencia Severo Ochoa 2013–2017’, SEV-2012–0208 and Secretaria d’Universitats i Recerca del Departament d’Economia i Coneixement de la Generalitat de Catalunya (grant no. 2017SGR595). This project has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement no. 823839 (EPIC-XS). Y.P.-R. acknowledges the Wellcome Trust (grant no. 208391/Z/17/Z).
- Published
- 2020
16. Philosopher: a versatile toolkit for shotgun proteomics data analysis
- Author
-
Andy T. Kong, Avinash Kumar Shanmugam, Dattatreya Mellacheruvu, Dmitry M. Avtonomov, Felipe da Veiga Leprevost, Sarah E. Haynes, Hui Yin Chang, and Alexey I. Nesvizhskii
- Subjects
Data Analysis ,Proteomics ,Proteomics methods ,Extramural ,Computer science ,MEDLINE ,Computational Biology ,Cell Biology ,Computational biology ,Biochemistry ,Article ,Shotgun proteomics ,Databases, Protein ,Molecular Biology ,Software ,Biotechnology - Published
- 2020
17. Fast quantitative analysis of timsTOF PASEF data with MSFragger and IonQuant
- Author
-
Dmitry M. Avtonomov, Guo Ci Teo, Sarah E. Haynes, Daniel A. Polasky, Alexey I. Nesvizhskii, and Fengchao Yu
- Subjects
Proteomics ,Accuracy and precision ,Proteome ,Bioinformatics ,Computer science ,Feature extraction ,Peptide ,Saccharomyces cerevisiae ,algorithms ,Mass spectrometry ,Tandem mass spectrometry ,01 natural sciences ,Sensitivity and Specificity ,Biochemistry ,label-free quantification ,PASEF ,Bottleneck ,Analytical Chemistry ,03 medical and health sciences ,ion mobility ,protein identification ,Tandem Mass Spectrometry ,High complexity ,Ion Mobility Spectrometry ,Escherichia coli ,Humans ,Databases, Protein ,Molecular Biology ,Phylogeny ,030304 developmental biology ,chemistry.chemical_classification ,0303 health sciences ,Multi-core processor ,010401 analytical chemistry ,030302 biochemistry & molecular biology ,Ms analysis ,Technological Innovation and Resources ,0104 chemical sciences ,Label-free quantification ,chemistry ,Peptides ,Protein Processing, Post-Translational ,Algorithm ,Chromatography, Liquid ,HeLa Cells - Abstract
Ion mobility helps resolve complex proteomics samples, but data structures can be unwieldy and lead to long post-acquisition analysis times. We adapted the fast search engine MSFragger for timsTOF data, and developed IonQuant for accurate quantification. These tools are part of a complete pipeline that is well suited for the analysis of timsTOF in terms of identification sensitivity, quantification accuracy, and runtimes. We additionally demonstrate complex analyses, including semi-enzymatic database search to monitor gas-phase fragmentation in early timsTOF data., Graphical Abstract Highlights • MSFragger now supports raw timsTOF PASEF data. • IonQuant performs fast and accurate feature detection and quantification. • MSFragger and IonQuant provide excellent performance for timsTOF PASEF data. • Flexibility allows for complex analyses, such as semi-enzymatic and open search., Ion mobility brings an additional dimension of separation to LC–MS, improving identification of peptides and proteins in complex mixtures. A recently introduced timsTOF mass spectrometer (Bruker) couples trapped ion mobility separation to TOF mass analysis. With the parallel accumulation serial fragmentation (PASEF) method, the timsTOF platform achieves promising results, yet analysis of the data generated on this platform represents a major bottleneck. Currently, MaxQuant and PEAKS are most used to analyze these data. However, because of the high complexity of timsTOF PASEF data, both require substantial time to perform even standard tryptic searches. Advanced searches (e.g. with many variable modifications, semi- or non-enzymatic searches, or open searches for post-translational modification discovery) are practically impossible. We have extended our fast peptide identification tool MSFragger to support timsTOF PASEF data, and developed a label-free quantification tool, IonQuant, for fast and accurate 4-D feature extraction and quantification. Using a HeLa data set published by Meier et al. (2018), we demonstrate that MSFragger identifies significantly (∼30%) more unique peptides than MaxQuant (1.6.10.43), and performs comparably or better than PEAKS X+ (∼10% more peptides). IonQuant outperforms both in terms of number of quantified proteins while maintaining good quantification precision and accuracy. Runtime tests show that MSFragger and IonQuant can fully process a typical two-hour PASEF run in under 70 min on a typical desktop (6 CPU cores, 32 GB RAM), significantly faster than other tools. Finally, through semi-enzymatic searching, we significantly increase the number of identified peptides. Within these semi-tryptic identifications, we report evidence of gas-phase fragmentation before MS/MS analysis.
- Published
- 2020
- Full Text
- View/download PDF
18. Determining Allele-Specific Protein Expression (ASPE) Using a Novel Quantitative Concatamer Based Proteomics Method
- Author
-
Hao Jie Zhu, Hui Jiang, Danxin Wang, Jian Shi, Huaijun Zhu, Xinwen Wang, and Alexey I. Nesvizhskii
- Subjects
0301 basic medicine ,chemistry.chemical_classification ,Nonsynonymous substitution ,Mutant ,Peptide ,Heterozygote advantage ,General Chemistry ,Computational biology ,Biology ,Proteomics ,030226 pharmacology & pharmacy ,Biochemistry ,Protein expression ,03 medical and health sciences ,030104 developmental biology ,0302 clinical medicine ,chemistry ,Allele ,Allele specific - Abstract
Measuring allele-specific expression (ASE) is a powerful approach for identifying cis-regulatory genetic variants. Here, we developed a novel targeted proteomics method for the quantification of allele-specific protein expression (ASPE) based on scheduled parallel reaction monitoring (PRM) with a heavy stable isotope-labeled quantitative concatamer (QconCAT) internal protein standard. This strategy was applied to the determination of the ASPE of UGT2B15 in human livers using the common UGT2B15 nonsynonymous variant rs1902023 (i.e., Y85D) as the marker to differentiate expressions from the two alleles. The QconCAT standard contains both the wild-type tryptic peptide and the Y85D mutant peptide at a ratio of 1:1 to ensure accurate measurement of the ASPE of UGT2B15. The results from 18 UGT2B15 Y85D heterozygotes revealed that the ratios between the wild-type Y allele and the mutant D allele varied from 0.60 to 1.46, indicating the presence of cis-regulatory variants. In addition, we observed no significant ...
- Published
- 2018
19. The Ewing Sarcoma Secretome and Its Response to Activation of Wnt/beta-catenin Signaling
- Author
-
Alexey I. Nesvizhskii, Felipe da Veiga Leprevost, Colin Sperring, Elisabeth A. Pedersen, Venkatesha Basrur, Allegra G. Hawkins, and Elizabeth R. Lawlor
- Subjects
0301 basic medicine ,Proteome ,Sarcoma, Ewing ,Biochemistry ,Mass Spectrometry ,Analytical Chemistry ,Extracellular matrix ,03 medical and health sciences ,Cell Line, Tumor ,Wnt3A Protein ,Tumor Microenvironment ,medicine ,Humans ,Wnt Signaling Pathway ,Molecular Biology ,Tumor microenvironment ,biology ,Research ,Tenascin C ,Wnt signaling pathway ,medicine.disease ,Extracellular Matrix ,Neoplasm Proteins ,Up-Regulation ,030104 developmental biology ,Tumor progression ,biology.protein ,Cancer research ,Sarcoma ,WNT3A - Abstract
Tumor: tumor microenvironment (TME) interactions are critical for tumor progression and the composition and structure of the local extracellular matrix (ECM) are key determinants of tumor metastasis. We recently reported that activation of Wnt/beta-catenin signaling in Ewing sarcoma cells induces widespread transcriptional changes that are associated with acquisition of a metastatic tumor phenotype. Significantly, ECM protein-encoding genes were found to be enriched among Wnt/beta-catenin induced transcripts, leading us to hypothesize that activation of canonical Wnt signaling might induce changes in the Ewing sarcoma secretome. To address this hypothesis, conditioned media from Ewing sarcoma cell lines cultured in the presence or absence of Wnt3a was collected for proteomic analysis. Label-free mass spectrometry was used to identify and quantify differentially secreted proteins. We then used in silico databases to identify only proteins annotated as secreted. Comparison of the secretomes of two Ewing sarcoma cell lines revealed numerous shared proteins, as well as a degree of heterogeneity, in both basal and Wnt-stimulated conditions. Gene set enrichment analysis of secreted proteins revealed that Wnt stimulation reproducibly resulted in increased secretion of proteins involved in ECM organization, ECM receptor interactions, and collagen formation. In particular, Wnt-stimulated Ewing sarcoma cells up-regulated secretion of structural collagens, as well as matricellular proteins, such as the metastasis-associated protein, tenascin C (TNC). Interrogation of published databases confirmed reproducible correlations between Wnt/beta-catenin activation and TNC and COL1A1 expression in patient tumors. In summary, this first study of the Ewing sarcoma secretome reveals that Wnt/beta-catenin activated tumor cells upregulate secretion of ECM proteins. Such Wnt/beta-catenin mediated changes are likely to impact on tumor: TME interactions that contribute to metastatic progression.
- Published
- 2018
20. The antiviral enzyme viperin inhibits cholesterol biosynthesis
- Author
-
E. Neil G. Marsh, Ayesha M. Patel, James Windak, Robert T. Kennedy, Youngsoo Kim, Keerthi Sajja, Alexey I. Nesvizhskii, Timothy J. Grunkemeyer, Soumi Ghosh, and Venkatesha Basrur
- Subjects
0301 basic medicine ,DAVID, Database for Annotation, Visualization and Integrated Discovery ,Biochemistry ,Cell membrane ,radical SAM enzyme ,Intramolecular Transferases ,Lipid raft ,biology ,Chemistry ,ddhCTP, 3’-deoxy-3’,4’-didehydro-CTP ,Transfection ,SAINT, Significance Analysis of INTeractome ,Cell biology ,Cholesterol ,medicine.anatomical_structure ,viperin, Virus Inhibitory Protein, Endoplasmic Reticulum-associated, Interferon iNducible ,Viperin ,cholesterol regulation ,viperin ,Research Article ,FPPS, farnesyl pyrophosphate synthase ,Protein Binding ,Oxidoreductases Acting on CH-CH Group Donors ,Cytidine Triphosphate ,BIBB 515, (1-(4-chlorobenzoyl)-4-((4-(2-oxazolin-2-yl) benzylidene))piperidine) ,Antiviral Agents ,ER, endoplasmic reticulum ,03 medical and health sciences ,Downregulation and upregulation ,medicine ,Humans ,IFN, interferon ,Molecular Biology ,BHT, 2,6-di-tert-butyl-4-methylphenol ,PEI, polyethyleneimine ,HMGR, 3-hydroxy-3-methylglutaryl CoA reductase ,030102 biochemistry & molecular biology ,Endoplasmic reticulum ,HEK 293 cells ,LS, lanosterol synthase ,Proteins ,Cell Biology ,squalene monooxygenase ,HEK293T, human embryonic kidney 293T ,Biosynthetic Pathways ,TBS, Tris-buffered saline ,HEK293 Cells ,030104 developmental biology ,biology.protein ,interactome analysis ,UPLC, ultra performance LC ,RSV, respiratory syncytial virus ,SM, squalene monooxygenase ,lanosterol synthase ,TRAF6, TNF receptor–associated factor 6 ,Lanosterol synthase - Abstract
Many enveloped viruses bud from cholesterol-rich lipid rafts on the cell membrane. Depleting cellular cholesterol impedes this process and results in viral particles with reduced viability. Viperin (Virus Inhibitory Protein, Endoplasmic Reticulum-associated, Interferon iNducible) is an endoplasmic reticulum membrane-associated enzyme that exerts broad-ranging antiviral effects, including inhibiting the budding of some enveloped viruses. However, the relationship between viperin expression and the retarded budding of virus particles from lipid rafts on the cell membrane is unclear. Here, we investigated the effect of viperin expression on cholesterol biosynthesis using transiently expressed genes in the human cell line human embryonic kidney 293T (HEK293T). We found that viperin expression reduces cholesterol levels by 20% to 30% in these cells. Following this observation, a proteomic screen of the viperin interactome identified several cholesterol biosynthetic enzymes among the top hits, including lanosterol synthase (LS) and squalene monooxygenase (SM), which are enzymes that catalyze key steps in establishing the sterol carbon skeleton. Coimmunoprecipitation experiments confirmed that viperin, LS, and SM form a complex at the endoplasmic reticulum membrane. While coexpression of viperin was found to significantly inhibit the specific activity of LS in HEK293T cell lysates, coexpression of viperin had no effect on the specific activity of SM, although did reduce SM protein levels by approximately 30%. Despite these inhibitory effects, the coexpression of neither LS nor SM was able to reverse the viperin-induced depletion of cellular cholesterol levels, possibly because viperin is highly expressed in transfected HEK293T cells. Our results establish a link between viperin expression and downregulation of cholesterol biosynthesis that helps explain viperin's antiviral effects against enveloped viruses.
- Published
- 2021
21. MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry–based proteomics
- Author
-
Felipe da Veiga Leprevost, Andy T. Kong, Dattatreya Mellacheruvu, Dmitry M. Avtonomov, and Alexey I. Nesvizhskii
- Subjects
0301 basic medicine ,False discovery rate ,030102 biochemistry & molecular biology ,Computer science ,Search engine indexing ,Cell Biology ,Computational biology ,Tandem mass spectrometry ,Proteomics ,Mass spectrometry ,Bioinformatics ,Biochemistry ,03 medical and health sciences ,Identification (information) ,030104 developmental biology ,Proteome ,Database search engine ,Molecular Biology ,Biotechnology - Abstract
There is a need to better understand and handle the 'dark matter' of proteomics-the vast diversity of post-translational and chemical modifications that are unaccounted in a typical mass spectrometry-based analysis and thus remain unidentified. We present a fragment-ion indexing method, and its implementation in peptide identification tool MSFragger, that enables a more than 100-fold improvement in speed over most existing proteome database search tools. Using several large proteomic data sets, we demonstrate how MSFragger empowers the open database search concept for comprehensive identification of peptides and all their modified forms, uncovering dramatic differences in modification rates across experimental samples and conditions. We further illustrate its utility using protein-RNA cross-linked peptide data and using affinity purification experiments where we observe, on average, a 300% increase in the number of identified spectra for enriched proteins. We also discuss the benefits of open searching for improved false discovery rate estimation in proteomics.
- Published
- 2017
22. IonQuant Enables Accurate and Sensitive Label-Free Quantification With FDR-Controlled Match-Between-Runs
- Author
-
Fengchao Yu, Alexey I. Nesvizhskii, and Sarah E. Haynes
- Subjects
Proteomics ,False discovery rate ,LC-MS, liquid chromatography-mass spectrometry ,Saccharomyces cerevisiae Proteins ,FDR, false discovery rate ,Computer science ,Ion-mobility spectrometry ,MBR, match-between-runs ,Quantitative proteomics ,false discovery rates ,DIA, data-independent acquisition ,Mass spectrometry ,single-cell proteomics ,Orbitrap ,Biochemistry ,label-free quantification ,Analytical Chemistry ,law.invention ,03 medical and health sciences ,match-between-runs ,law ,Humans ,Data-independent acquisition ,Databases, Protein ,Molecular Biology ,mass spectrometry ,030304 developmental biology ,0303 health sciences ,Escherichia coli Proteins ,030302 biochemistry & molecular biology ,Technological Innovation and Resources ,CV, coefficient of variation ,LFQ, label-free quantification ,Proteins ,FAIMS, high-field asymmetric ion mobility spectrometry ,PSM, peptide-spectrum match ,Mixture model ,Missing data ,Label-free quantification ,DDA, data-dependent acquisition ,Single-Cell Analysis ,Peptides ,Biological system ,LDA, linear discriminant analysis ,Algorithms ,Software ,HeLa Cells - Abstract
Missing values weaken the power of label-free quantitative proteomic experiments to uncover true quantitative differences between biological samples or experimental conditions. Match-between-runs (MBR) has become a common approach to mitigate the missing value problem, where peptides identified by tandem mass spectra in one run are transferred to another by inference based on m/z, charge state, retention time, and ion mobility when applicable. Though tolerances are used to ensure such transferred identifications are reasonably located and meet certain quality thresholds, little work has been done to evaluate the statistical confidence of MBR. Here, we present a mixture model-based approach to estimate the false discovery rate (FDR) of peptide and protein identification transfer, which we implement in the label-free quantification tool IonQuant. Using several benchmarking datasets generated on both Orbitrap and timsTOF mass spectrometers, we demonstrate superior performance of IonQuant with FDR-controlled MBR compared with MaxQuant (19–38 times faster; 6–18% more proteins quantified and with comparable or better accuracy). We further illustrate the performance of IonQuant and highlight the need for FDR-controlled MBR, in two single-cell proteomics experiments, including one acquired with the help of high-field asymmetric ion mobility spectrometry separation. Fully integrated in the FragPipe computational environment, IonQuant with FDR-controlled MBR enables fast and accurate peptide and protein quantification in label-free proteomics experiments., Graphical Abstract, Highlights • A mixture-model approach controls the false discovery rate of match-between-runs. • The method is implemented in IonQuant. • Experiments with various data types show high sensitivity and accuracy of IonQuant., In Brief Match-between-runs is a powerful approach to mitigate the missing value problem in label-free quantification. It transfers features identified by MS/MS from one run to the other, but previously, there was no false discovery rate control over this process. We present a mixture model–based approach to estimate and control the false discovery rate, which we have implemented in IonQuant. We demonstrate the sensitivity, accuracy, and speed of IonQuant using proteomics data from timsTOF, Orbitrap, and Orbitrap coupled to FAIMS.
- Published
- 2021
23. Data Independent Acquisition analysis in ProHits 4.0
- Author
-
Chih-Chiang Tsou, Jianping Zhang, James D.R. Knight, Jian Wang, Alexey I. Nesvizhskii, Anne-Claude Gingras, Brett Larsen, Guomin Liu, Nuno Bandeira, Mike Tyers, Brian Raught, Jean-Philippe Lambert, and Hyungwon Choi
- Subjects
Proteomics ,0301 basic medicine ,Biophysics ,Biology ,computer.software_genre ,Bioinformatics ,Biochemistry ,Interactome ,Chromatography, Affinity ,Mass Spectrometry ,Article ,03 medical and health sciences ,Software ,Protein Interaction Mapping ,Data-independent acquisition ,Databases, Protein ,business.industry ,Proteins ,Pipeline (software) ,Visualization ,Identification (information) ,030104 developmental biology ,Workflow ,Data mining ,Peptides ,business ,computer - Abstract
Affinity purification coupled with mass spectrometry (AP-MS) is a powerful technique for the identification and quantification of physical interactions. AP-MS requires careful experimental design, appropriate control selection and quantitative workflows to successfully identify bona fide interactors amongst a large background of contaminants. We previously introduced ProHits, a Laboratory Information Management System for interaction proteomics, which tracks all samples in a mass spectrometry facility, initiates database searches and provides visualization tools for spectral counting-based AP-MS approaches. More recently, we implemented Significance Analysis of INTeractome (SAINT) within ProHits to provide scoring of interactions based on spectral counts. Here, we provide an update to ProHits to support Data Independent Acquisition (DIA) with identification software (DIA-Umpire and MSPLIT-DIA), quantification tools (through DIA-Umpire, or externally via targeted extraction), and assessment of quantitative enrichment (through mapDIA) and scoring of interactions (through SAINT-intensity). With additional improvements, notably support of the iProphet pipeline, facilitated deposition into ProteomeXchange repositories and enhanced export and viewing functions, ProHits 4.0 offers a comprehensive suite of tools to facilitate affinity proteomics studies. Significance It remains challenging to score, annotate and analyze proteomics data in a transparent manner. ProHits was previously introduced as a LIMS to enable storing, tracking and analysis of standard AP-MS data. In this revised version, we expand ProHits to include integration with a number of identification and quantification tools based on Data-Independent Acquisition (DIA). ProHits 4.0 also facilitates data deposition into public repositories, and the transfer of data to new visualization tools.
- Published
- 2016
24. Response to the Comments on 'Determining Allele-Specific Protein Expression (ASPE) Using a Novel Quantitative Concatamer Proteomics Method'
- Author
-
Jian Shi, Hui Jiang, Xinwen Wang, Alexey I. Nesvizhskii, Hao Jie Zhu, Huaijun Zhu, and Danxin Wang
- Subjects
Proteomics ,chemistry.chemical_classification ,Proteome ,Mutant ,General Chemistry ,Biology ,Trypsin ,Biochemistry ,Phenotype ,Amino acid ,chemistry ,medicine ,Allele ,Peptides ,Gene ,Alleles ,medicine.drug - Abstract
Russell and colleagues deserve credit for being the first to use a QconCAT standard to simultaneously quantify both the wild-type and mutant peptides of a protein (i.e., CYP2B6) ( J. Proteome Res. 2013, 12 (12), 5934-5942. DOI: 10.1021/pr400279u). However, the rationale of their study was entirely different from ours ( J. Proteome Res. 2018, 17 (10), 3606-3612. DOI: 10.1021/acs.jproteome.8b00620). Their study focused on the quantification of individual drug-metabolizing enzymes and transporters, whereas ours developed a targeted proteomics method to determine the allele-specific protein expression (ASPE) of a gene and advocated the use of the ASPE imbalance as the phenotype for identifying cis-regulatory genetic variants of the gene. More importantly, the digestion enzyme trypsin interacts with three to four amino acid residues around scissile bonds, and certain residues, such as negatively charged amino acids, can significantly affect the digestion efficiency. The QconCAT standard reported in our study differs from conventional QconCAT standards such as that used by Russell et al. in that at least 15 native flanking amino acids were included to ensure accurate measurement of ASPE ratios.
- Published
- 2019
25. Effective Leveraging of Targeted Search Spaces for Improving Peptide Identification in Tandem Mass Spectrometry Based Proteomics
- Author
-
Alexey I. Nesvizhskii and Avinash Kumar Shanmugam
- Subjects
Sequence database ,Peptide spectral library ,Proteome ,RefSeq ,Ensembl ,General Chemistry ,Computational biology ,UniProt ,Biology ,Bioinformatics ,Proteomics ,Shotgun proteomics ,Biochemistry - Abstract
In shotgun proteomics, peptides are typically identified using database searching, which involves scoring acquired tandem mass spectra against peptides derived from standard protein sequence databases such as Uniprot, Refseq, or Ensembl. In this strategy, the sensitivity of peptide identification is known to be affected by the size of the search space. Therefore, creating a targeted sequence database containing only peptides likely to be present in the analyzed sample can be a useful technique for improving the sensitivity of peptide identification. In this study, we describe how targeted peptide databases can be created based on the frequency of identification in the global proteome machine database (GPMDB), the largest publicly available repository of peptide and protein identification data. We demonstrate that targeted peptide databases can be easily integrated into existing proteome analysis workflows and describe a computational strategy for minimizing any loss of peptide identifications arising from potential search space incompleteness in the targeted search spaces. We demonstrate the performance of our workflow using several data sets of varying size and sample complexity.
- Published
- 2015
26. mapDIA: Preprocessing and statistical analysis of quantitative proteomics data from data independent acquisition mass spectrometry
- Author
-
Anne-Claude Gingras, Guoshou Teo, Sinae Kim, Chih-Chiang Tsou, Alexey I. Nesvizhskii, Hyungwon Choi, and Ben C. Collins
- Subjects
Proteomics ,Normalization (statistics) ,Proteome ,Computer science ,Quantitative proteomics ,Biophysics ,Data preprocessing ,computer.software_genre ,Biochemistry ,Article ,Mass Spectrometry ,Set (abstract data type) ,Differential expression ,SDG 3 - Good Health and Well-being ,Sequence Analysis, Protein ,Protein Interaction Mapping ,Preprocessor ,Computer Simulation ,Data-independent acquisition ,Amino Acid Sequence ,Models, Statistical ,Gene Expression Profiling ,Normalization ,Data Interpretation, Statistical ,Outlier ,Data mining ,Data pre-processing ,computer ,Data independent acquisition - Abstract
Data independent acquisition (DIA) mass spectrometry is an emerging technique that offers more complete detection and quantification of peptides and proteins across multiple samples. DIA allows fragment-level quantification, which can be considered as repeated measurements of the abundance of the corresponding peptides and proteins in the downstream statistical analysis. However, few statistical approaches are available for aggregating these complex fragment-level data into peptide- or protein-level statistical summaries. In this work, we describe a software package, mapDIA, for statistical analysis of differential protein expression using DIA fragment-level intensities. The workflow consists of three major steps: intensity normalization, peptide/fragment selection, and statistical analysis. First, mapDIA offers normalization of fragment-level intensities by total intensity sums as well as a novel alternative normalization by local intensity sums in retention time space. Second, mapDIA removes outlier observations and selects peptides/fragments that preserve the major quantitative patterns across all samples for each protein. Last, using the selected fragments and peptides, mapDIA performs model-based statistical significance analysis of protein-level differential expression between specified groups of samples. Using a comprehensive set of simulation datasets, we show that mapDIA detects differentially expressed proteins with accurate control of the false discovery rates. We also describe the analysis procedure in detail using two recently published DIA datasets generated for 14–3-3β dynamic interaction network and prostate cancer glycoproteome. Availability The software was written in C ++ language and the source code is available for free through SourceForge website http://sourceforge.net/projects/mapdia/ .This article is part of a Special Issue entitled: Computational Proteomics.
- Published
- 2015
27. QPROT: Statistical method for testing differential expression using protein-level intensity data in label-free quantitative proteomics
- Author
-
Sinae Kim, Hyungwon Choi, Chih-Chiang Tsou, Alexey I. Nesvizhskii, and Damian Fermin
- Subjects
False discovery rate ,Models, Statistical ,Software suite ,Proteome ,Staining and Labeling ,Computer science ,Gene Expression Profiling ,Molecular Sequence Data ,Quantitative proteomics ,Posterior probability ,Biophysics ,Missing data ,computer.software_genre ,Biochemistry ,Article ,Sequence Analysis, Protein ,Data Interpretation, Statistical ,Protein Expression Analysis ,Computer Simulation ,Amino Acid Sequence ,Data mining ,computer ,Algorithms ,Empirical Bayes method ,Count data - Abstract
We introduce QPROT, a statistical framework and computational tool for differential protein expression analysis using protein intensity data. QPROT is an extension of the QSPEC suite, originally developed for spectral count data, adapted for the analysis using continuously measured protein-level intensity data. QPROT offers a new intensity normalization procedure and model-based differential expression analysis, both of which account for missing data. Determination of differential expression of each protein is based on the standardized Z-statistic based on the posterior distribution of the log fold change parameter, guided by the false discovery rate estimated by a well-known Empirical Bayes method. We evaluated the classification performance of QPROT using the quantification calibration data from the clinical proteomic technology assessment for cancer (CPTAC) study and a recently published Escherichia coli benchmark dataset, with evaluation of FDR accuracy in the latter. Biological significance QPROT is a statistical framework with computational software tool for comparative quantitative proteomics analysis. It features various extensions of QSPEC method originally built for spectral count data analysis, including probabilistic treatment of missing values in protein intensity data. With the increasing popularity of label-free quantitative proteomics data, the proposed method and accompanying software suite will be immediately useful for many proteomics laboratories. This article is part of a Special Issue entitled: Computational Proteomics.
- Published
- 2015
28. Two-pass alignment improves novel splice junction quantification
- Author
-
Saravana M. Dhanasekaran, Brendan A. Veeneman, Sudhanshu Shukla, Arul M. Chinnaiyan, and Alexey I. Nesvizhskii
- Subjects
Statistics and Probability ,Base Sequence ,Computer science ,RNA Splicing ,Read depth ,Computational biology ,Original Papers ,Biochemistry ,Genome ,Computer Science Applications ,Transcriptome Sequencing ,Transcriptome ,Computational Mathematics ,Computational Theory and Mathematics ,Cell Line, Tumor ,RNA splicing ,RNA Sequence ,Splice junction ,Humans ,RNA Splice Sites ,Databases, Nucleic Acid ,Sequence Alignment ,Molecular Biology - Abstract
Motivation: Discovery of novel splicing from RNA sequence data remains a critical and exciting focus of transcriptomics, but reduced alignment power impedes expression quantification of novel splice junctions. Results: Here, we profile performance characteristics of two-pass alignment, which separates splice junction discovery from quantification. Per sample, across a variety of transcriptome sequencing datasets, two-pass alignment improved quantification of at least 94% of simulated novel splice junctions, and provided as much as 1.7-fold deeper median read depth over those splice junctions. We further demonstrate that two-pass alignment works by increasing alignment of reads to splice junctions by short lengths, and that potential alignment errors are readily identifiable by simple classification. Taken together, two-pass alignment promises to advance quantification and discovery of novel splicing events. Contact: arul@med.umich.edu, nesvi@med.umich.edu Availability and implementation: Two-pass alignment was implemented here as sequential alignment, genome indexing, and re-alignment steps with STAR. Full parameters are provided in Supplementary Table 2. Supplementary information: Supplementary data are available at Bioinformatics online.
- Published
- 2015
29. DIA-Umpire: comprehensive computational framework for data-independent acquisition proteomics
- Author
-
Brett Larsen, Chih-Chiang Tsou, Anne-Claude Gingras, Alexey I. Nesvizhskii, Dmitry M. Avtonomov, Hyungwon Choi, and Monika Tucholska
- Subjects
Proteomics ,Quantitative proteomics ,Biology ,Mass spectrometry ,Bioinformatics ,Biochemistry ,Article ,Mass Spectrometry ,Workflow ,Software ,Fragment (logic) ,Humans ,Metabolomics ,Multiplex ,Data-independent acquisition ,Databases, Protein ,Molecular Biology ,business.industry ,Proteins ,Pattern recognition ,Cell Biology ,Peptide Fragments ,Glycoproteomics ,Artificial intelligence ,business ,Algorithms ,Biotechnology - Abstract
Due to recent improvements in mass spectrometry (MS), there is an increased interest in data independent acquisition (DIA) strategies in which all peptides are systematically fragmented using wide mass isolation windows (“multiplex fragmentation”). DIA-Umpire (http://diaumpire.sourceforge.net/), a comprehensive computational workflow and open-source software for DIA data, detects precursor and fragment chromatographic features and assembles them into pseudo MS/MS spectra. These spectra can be identified using conventional database searching and protein inference tools, allowing sensitive untargeted analysis of DIA data without the need for a spectral library. Quantification is obtained using both precursor and fragment ion intensities. Furthermore, DIA-Umpire enables targeted extraction of quantitative information based on peptides initially identified in only a subset of the samples, resulting in more consistent quantification across multiple samples. We demonstrate the performance of the method using control samples of varying complexity, and publicly available glycoproteomics and affinity purification - mass spectrometry data.
- Published
- 2015
30. BioContainers: An open-source and community-driven framework for software standardization
- Author
-
Harald Barsnes, Laurent Gatto, Saulo Alves Aflitos, Marc Vaudel, Mingze Bai, Julianus Pfeuffer, Jonas Weber, Alexey I. Nesvizhskii, Roberto Vera Alvarez, Julian Uszkoreit, Hannes L. Röst, Björn Grüning, Johannes Griss, Felipe da Veiga Leprevost, Yasset Perez-Riverol, Pablo Moreno, Timo Sachsenberg, and Rafael C. Jimenez
- Subjects
Proteomics ,0301 basic medicine ,Statistics and Probability ,Resource-oriented architecture ,Computer science ,Genomics ,Biochemistry ,03 medical and health sciences ,Software ,Metabolomics ,Software verification and validation ,Molecular Biology ,Software measurement ,business.industry ,Software development ,Computational Biology ,computer.file_format ,Applications Notes ,Data science ,Software quality ,Computer Science Applications ,Computational Mathematics ,030104 developmental biology ,Computational Theory and Mathematics ,Software deployment ,Software construction ,Executable ,business ,Software engineering ,Sequence Analysis ,computer - Abstract
Motivation BioContainers (biocontainers.pro) is an open-source and community-driven framework which provides platform independent executable environments for bioinformatics software. BioContainers allows labs of all sizes to easily install bioinformatics software, maintain multiple versions of the same software and combine tools into powerful analysis pipelines. BioContainers is based on popular open-source projects Docker and rkt frameworks, that allow software to be installed and executed under an isolated and controlled environment. Also, it provides infrastructure and basic guidelines to create, manage and distribute bioinformatics containers with a special focus on omics technologies. These containers can be integrated into more comprehensive bioinformatics pipelines and different architectures (local desktop, cloud environments or HPC clusters). Availability and Implementation The software is freely available at github.com/BioContainers/.
- Published
- 2017
31. Proteogenomics: concepts, applications and computational strategies
- Author
-
Alexey I. Nesvizhskii
- Subjects
Proteomics ,Genetics ,Proteomics methods ,Proteome ,NeXtProt ,Extramural ,Genetic Variation ,High-Throughput Nucleotide Sequencing ,Genomics ,Cell Biology ,Computational biology ,Biology ,Proteogenomics ,Biochemistry ,Mass Spectrometry ,Article ,Sequence Analysis, Protein ,Protein Isoforms ,Cancer biology ,Databases, Nucleic Acid ,Databases, Protein ,Molecular Biology ,Biotechnology - Abstract
Proteogenomics is an area of research at the interface of proteomics and genomics. In this approach, customized protein sequence databases generated using genomic and transcriptomic information are used to help identify novel peptides (not present in reference protein sequence databases) from mass spectrometry-based proteomic data; in turn, the proteomic data can be used to provide protein-level evidence of gene expression and to help refine gene models. In recent years, owing to the emergence of next generation sequencing technologies such as RNA-Seq and dramatic improvements in the depths and throughput of mass spectrometry-based proteomics, the pace of proteogenomics research has greatly accelerated. Here I review the current state of proteogenomics methods and applications, including computational strategies for building and using customized protein sequence databases. I also draw attention to the challenge of false positives in proteogenomics, and provide guidelines for analyzing the data and reporting the results of proteogenomics studies.
- Published
- 2014
32. Utility of RNA-seq and GPMDB Protein Observation Frequency for Improving the Sensitivity of Protein Identification by Tandem MS
- Author
-
Anastasia K. Yocum, Avinash Kumar Shanmugam, and Alexey I. Nesvizhskii
- Subjects
False discovery rate ,Proteomics ,Proteome ,Tandem mass spectrometry ,RNA-Seq ,probability adjustment ,Computational biology ,Biology ,computer.software_genre ,Biochemistry ,Article ,FDR ,03 medical and health sciences ,Cell Line, Tumor ,Humans ,Database search engine ,confidence threshold ,Databases, Protein ,030304 developmental biology ,0303 health sciences ,Models, Statistical ,Sequence Analysis, RNA ,GPMDB ,030302 biochemistry & molecular biology ,Proteins ,Reproducibility of Results ,Statistical model ,General Chemistry ,Identification (information) ,Data mining ,RNA-seq ,computer ,integrative analysis - Abstract
Tandem mass spectrometry (MS/MS) followed by database search is the method of choice for protein identification in proteomic studies. Database searching methods employ spectral matching algorithms and statistical models to identify and quantify proteins in a sample. In general, these methods do not utilize any information other than spectral data for protein identification. However, considering the wealth of external data available for many biological systems, analysis methods can incorporate such information to improve the sensitivity of protein identification. In this study, we present a method to utilize Global Proteome Machine Database identification frequencies and RNA-seq transcript abundances to adjust the confidence scores of protein identifications. The method described is particularly useful for samples with low-to-moderate proteome coverage (i.e.
- Published
- 2014
33. Fusion Peptides from Oncogenic Chimeric Proteins as Putative Specific Biomarkers of Cancer
- Author
-
Michael J. MacCoss, Kevin P. Conlon, Venkatesha Basrur, Kojo S.J. Elenitoba-Johnson, Alexey I. Nesvizhskii, Delphine Rolland, Thomas C. Wolfe, and Megan S. Lim
- Subjects
Oncogene Proteins, Fusion ,Research ,Cancer ,Heterologous ,Chromosomal translocation ,Computational biology ,Protein-Tyrosine Kinases ,Biology ,medicine.disease ,Biochemistry ,Fusion protein ,Molecular biology ,Mass Spectrometry ,Analytical Chemistry ,Chimeric RNA ,Cell Line, Tumor ,Biomarkers, Tumor ,medicine ,Humans ,Lymphoma, Large-Cell, Anaplastic ,Cancer biomarkers ,Peptides ,Molecular Biology ,Anaplastic large-cell lymphoma ,Gene - Abstract
Chromosomal translocations encoding chimeric fusion proteins constitute one of the most common mechanisms underlying oncogenic transformation in human cancer. Fusion peptides resulting from such oncogenic chimeric fusions, though unique to specific cancer subtypes, are unexplored as cancer biomarkers. Here we show, using an approach termed fusion peptide multiple reaction monitoring mass spectrometry, the direct identification of different cancer-specific fusion peptides arising from protein chimeras that are generated from the juxtaposition of heterologous genes fused by recurrent chromosomal translocations. Using fusion peptide multiple reaction monitoring mass spectrometry in a clinically relevant scenario, we demonstrate the specific, sensitive, and unambiguous detection of a specific diagnostic fusion peptide in clinical samples of anaplastic large cell lymphoma, but not in a diverse array of benign lymph nodes or other forms of primary malignant lymphomas and cancer-derived cell lines. Our studies highlight the utility of fusion peptides as cancer biomarkers and carry broad implications for the use of protein biomarkers in cancer detection and monitoring.
- Published
- 2013
34. Sparsely correlated hidden Markov models with application to genome-wide location studies
- Author
-
Hyungwon Choi, Alexey I. Nesvizhskii, Debashis Ghosh, Damian Fermin, and Zhaohui S. Qin
- Subjects
CD4-Positive T-Lymphocytes ,Statistics and Probability ,Chromatin Immunoprecipitation ,Multivariate statistics ,Transcription, Genetic ,Computation ,Inference ,Biology ,computer.software_genre ,Biochemistry ,Histones ,Humans ,Sensitivity (control systems) ,Hidden Markov model ,Molecular Biology ,Genome ,Models, Statistical ,Series (mathematics) ,Markov chain ,business.industry ,High-Throughput Nucleotide Sequencing ,Pattern recognition ,Statistical model ,Genomics ,Original Papers ,Markov Chains ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,Artificial intelligence ,Data mining ,business ,computer ,Algorithms - Abstract
Motivation: Multiply correlated datasets have become increasingly common in genome-wide location analysis of regulatory proteins and epigenetic modifications. Their correlation can be directly incorporated into a statistical model to capture underlying biological interactions, but such modeling quickly becomes computationally intractable. Results: We present sparsely correlated hidden Markov models (scHMM), a novel method for performing simultaneous hidden Markov model (HMM) inference for multiple genomic datasets. In scHMM, a single HMM is assumed for each series, but the transition probability in each series depends on not only its own hidden states but also the hidden states of other related series. For each series, scHMM uses penalized regression to select a subset of the other data series and estimate their effects on the odds of each transition in the given series. Following this, hidden states are inferred using a standard forward–backward algorithm, with the transition probabilities adjusted by the model at each position, which helps retain the order of computation close to fitting independent HMMs (iHMM). Hence, scHMM is a collection of inter-dependent non-homogeneous HMMs, capable of giving a close approximation to a fully multivariate HMM fit. A simulation study shows that scHMM achieves comparable sensitivity to the multivariate HMM fit at a much lower computational cost. The method was demonstrated in the joint analysis of 39 histone modifications, CTCF and RNA polymerase II in human CD4+ T cells. scHMM reported fewer high-confidence regions than iHMM in this dataset, but scHMM could recover previously characterized histone modifications in relevant genomic regions better than iHMM. In addition, the resulting combinatorial patterns from scHMM could be better mapped to the 51 states reported by the multivariate HMM method of Ernst and Kellis. Availability: The scHMM package can be freely downloaded from http://sourceforge.net/p/schmm/ and is recommended for use in a linux environment. Contact: ghoshd@psu.edu or zhaohui.qin@emory.edu Supplementary information: Supplementary data are available at Bioinformatics online.
- Published
- 2013
35. BatMass: a Java Software Platform for LC-MS Data Visualization in Proteomics and Metabolomics
- Author
-
Alexander Raskind, Alexey I. Nesvizhskii, and Dmitry M. Avtonomov
- Subjects
0301 basic medicine ,Proteomics ,Quality Control ,Computer science ,computer.software_genre ,Biochemistry ,Mass Spectrometry ,Article ,03 medical and health sciences ,Software ,Data visualization ,Computer Graphics ,Metabolomics ,Data processing ,business.industry ,General Chemistry ,File format ,Visualization ,030104 developmental biology ,Data access ,Data mining ,Mass spectrometry data format ,business ,Raw data ,computer ,Chromatography, Liquid - Abstract
Mass spectrometry (MS) coupled to liquid chromatography (LC) is a commonly used technique in metabolomic and proteomic research. As the size and complexity of LC-MS-based experiments grow, it becomes increasingly more difficult to perform quality control of both raw data and processing results. In a practical setting, quality control steps for raw LC-MS data are often overlooked, and assessment of an experiment's success is based on some derived metrics such as "the number of identified compounds". The human brain interprets visual data much better than plain text, hence the saying "a picture is worth a thousand words". Here, we present the BatMass software package, which allows for performing quick quality control of raw LC-MS data through its fast visualization capabilities. It also serves as a testbed for developers of LC-MS data processing algorithms by providing a data access library for open mass spectrometry file formats and a means of visually mapping processing results back to the original data. We illustrate the utility of BatMass with several use cases of quality control and data exploration.
- Published
- 2016
36. A Dual Role for Receptor-interacting Protein Kinase 2 (RIP2) Kinase Activity in Nucleotide-binding Oligomerization Domain 2 (NOD2)-dependent Autophagy
- Author
-
Gabriel Núñez, Arul M. Chinnaiyan, Christine McDonald, Noemí Marina-García, Alexey I. Nesvizhskii, Kourtney P. Nickerson, Arun Sreekumar, Craig R. Homer, and Amrita Kabi
- Subjects
Salmonella typhimurium ,MAP Kinase Signaling System ,p38 mitogen-activated protein kinases ,Nod2 Signaling Adaptor Protein ,Mitogen-activated protein kinase kinase ,Biology ,BAG3 ,MAP Kinase Kinase Kinase 4 ,p38 Mitogen-Activated Protein Kinases ,Biochemistry ,Receptor-Interacting Protein Serine-Threonine Kinase 2 ,Autophagy ,Humans ,ASK1 ,Protein Phosphatase 2 ,Intestinal Mucosa ,Kinase activity ,Molecular Biology ,ATG16L1 ,NF-kappa B ,Epithelial Cells ,Cell Biology ,Autophagy-related protein 13 ,digestive system diseases ,Cell biology ,Enzyme Activation ,HEK293 Cells ,Salmonella Infections ,Signal Transduction - Abstract
Autophagy is triggered by the intracellular bacterial sensor NOD2 (nucleotide-binding, oligomerization domain 2) as an anti-bacterial response. Defects in autophagy have been implicated in Crohn's disease susceptibility. The molecular mechanisms of activation and regulation of this process by NOD2 are not well understood, with recent studies reporting conflicting requirements for RIP2 (receptor-interacting protein kinase 2) in autophagy induction. We examined the requirement of NOD2 signaling mediated by RIP2 for anti-bacterial autophagy induction and clearance of Salmonella typhimurium in the intestinal epithelial cell line HCT116. Our data demonstrate that NOD2 stimulates autophagy in a process dependent on RIP2 tyrosine kinase activity. Autophagy induction requires the activity of the mitogen-activated protein kinases MEKK4 and p38 but is independent of NFκB signaling. Activation of autophagy was inhibited by a PP2A phosphatase complex, which interacts with both NOD2 and RIP2. PP2A phosphatase activity inhibited NOD2-dependent autophagy but not activation of NFκB or p38. Upon stimulation of NOD2, the phosphatase activity of the PP2A complex is inhibited through tyrosine phosphorylation of the catalytic subunit in a process dependent on RIP2 activity. These findings demonstrate that RIP2 tyrosine kinase activity is not only required for NOD2-dependent autophagy but plays a dual role in this process. RIP2 both sends a positive autophagy signal through activation of p38 MAPK and relieves repression of autophagy mediated by the phosphatase PP2A.
- Published
- 2012
37. Comparative Analysis of Different Label-Free Mass Spectrometry Based Protein Abundance Estimates and Their Correlation with RNA-Seq Gene Expression Data
- Author
-
Alexey I. Nesvizhskii, Damian Fermin, and Kang Ning
- Subjects
Proteomics ,Proteome ,Gene Expression Profiling ,Quantitative proteomics ,Computational Biology ,RNA-Seq ,Context (language use) ,General Chemistry ,Computational biology ,Biology ,Bioinformatics ,Biochemistry ,Mass Spectrometry ,Article ,Mitochondrial Proteins ,Gene expression profiling ,Database normalization ,Mice ,Liver ,Gene expression ,Animals ,RNA, Messenger ,Brain Stem - Abstract
An increasing number of studies involve integrative analysis of gene and protein expression data taking advantage of new technologies such as next-generation transcriptome sequencing (RNA-Seq) and highly sensitive mass spectrometry (MS) instrumentation. Thus, it becomes interesting to revisit the correlative analysis of gene and protein expression data using more recently generated data sets. Furthermore, within the proteomics community there is a substantial interest in comparing the performance of different label-free quantitative proteomic strategies. Gene expression data can be used as an indirect benchmark for such protein-level comparisons. In this work we use publicly available mouse data to perform a joint analysis of genomic and proteomic data obtained on the same organism. First, we perform a comparative analysis of different label-free protein quantification methods (intensity based and spectral count based and using various associated data normalization steps) using several software tools on the proteomic side. Similarly, we perform correlative analysis of gene expression data derived using microarray and RNA-Seq methods on the genomic side. We also investigate the correlation between gene and protein expression data, and various factors affecting the accuracy of quantitation at both levels. It is observed that spectral count based protein abundance metrics, which are easy to extract from any published data, are comparable to intensity based measures with respect to correlation with gene expression data. The results of this work should be useful for designing robust computational pipelines for extraction and joint analysis of gene and protein expression data in the context of integrative studies.
- Published
- 2012
38. Proteomic Study of the Mucin Granulae in an Intestinal Goblet Cell Model
- Author
-
Ana M. Rodríguez-Piñeiro, Gunnar C. Hansson, Malin E. V. Johansson, Alexey I. Nesvizhskii, Sjoerd van der Post, and Kristina A. Thomsson
- Subjects
Proteomics ,secretory vesicles ,Mucin 2 ,Biochemistry ,R-SNARE Proteins ,ultracentrifugation ,Synaptotagmins ,VAMP-8 ,0302 clinical medicine ,Protein Interaction Mapping ,Cells, Cultured ,mass spectrometry ,Principal Component Analysis ,0303 health sciences ,goblet cell ,Vesicle ,respiratory system ,Secretory Vesicle ,rab3A GTP-Binding Protein ,Cell biology ,Protein Transport ,multivariate analysis ,medicine.anatomical_structure ,Density gradient ultracentrifugation ,Goblet Cells ,Protein Binding ,Vacuolar Proton-Translocating ATPases ,Vesicle fusion ,Receptors, Cell Surface ,Biology ,digestive system ,Article ,03 medical and health sciences ,Colon, Sigmoid ,Centrifugation, Density Gradient ,medicine ,Humans ,Secretion ,ATP6AP2 ,FAM62B ,030304 developmental biology ,Mucin-2 ,Goblet cell ,Mucin ,mucins ,General Chemistry ,Peptide Fragments ,digestive system diseases ,MUC2 ,030217 neurology & neurosurgery - Abstract
Goblet cells specialize in producing and secreting mucus with its main component, mucins. An inducible goblet-like cell line was used for the purification of the mucus vesicles stored in these cells by density gradient ultracentrifugation, and their proteome was analyzed by nanoLC-MS and MS/MS. Although the density of these vesicles coincides with others, it was possible to reveal a number of proteins that after immunolocalization on colon tissue and functional analyses were likely to be linked to the MUC2 vesicles. Most of the proteins were associated with the vesicle membrane or their outer surface. The ATP6AP2, previously suggested to be associated with vesicular proton pumps, was colocalized with MUC2 without other V-ATPase proteins and, thus, probably has roles in mucin vesicle function yet to be discovered. FAM62B, known to be a calcium-sensitive protein involved in vesicle fusion, also colocalized with the MUC2 vesicles and is probably involved in unknown ways in the later events of the MUC2 vesicles and their secretion.
- Published
- 2012
- Full Text
- View/download PDF
39. Untargeted, spectral library-free analysis of data-independent acquisition proteomics data generated using Orbitrap mass spectrometers
- Author
-
Guo Ci Teo, Yu-Ju Chen, Chia-Feng Tsai, Chih-Chiang Tsou, and Alexey I. Nesvizhskii
- Subjects
0301 basic medicine ,Proteomics ,Computer science ,Posterior probability ,Analytical chemistry ,Mass spectrometry ,Orbitrap ,Biochemistry ,Mass Spectrometry ,Article ,law.invention ,03 medical and health sciences ,law ,Peptide mass ,Humans ,Data-independent acquisition ,Molecular Biology ,030102 biochemistry & molecular biology ,Human liver ,business.industry ,Computational Biology ,Pattern recognition ,030104 developmental biology ,HEK293 Cells ,Data analysis ,Artificial intelligence ,business ,HeLa Cells - Abstract
We describe an improved version of the data-independent acquisition (DIA) computational analysis tool DIA-Umpire, and show that it enables highly sensitive, untargeted, and direct (spectral library-free) analysis of DIA data obtained using the Orbitrap family of mass spectrometers. DIA-Umpire v2 implements an improved feature detection algorithm with two additional filters based on the isotope pattern and fractional peptide mass analysis. The targeted re-extraction step of DIA-Umpire is updated with an improved scoring function and a more robust, semiparametric mixture modeling of the resulting scores for computing posterior probabilities of correct peptide identification in a targeted setting. Using two publicly available Q Exactive DIA datasets generated using HEK-293 cells and human liver microtissues, we demonstrate that DIA-Umpire can identify similar number of peptide ions, but with better identification reproducibility between replicates and samples, as with conventional data-dependent acquisition. We further demonstrate the utility of DIA-Umpire using a series of Orbitrap Fusion DIA experiments with HeLa cell lysates profiled using conventional data-dependent acquisition and using DIA with different isolation window widths.
- Published
- 2015
40. PIQED: automated identification and quantification of protein modifications from DIA-MS data
- Author
-
Hanno Steen, Bradford W. Gibson, Sushanth Mukkamalla, Jesse G. Meyer, Alexey I. Nesvizhskii, and Birgit Schilling
- Subjects
Phosphopeptides ,0301 basic medicine ,Computer science ,Computational biology ,Bioinformatics ,Proteomics ,Biochemistry ,Article ,Mass Spectrometry ,Workflow ,03 medical and health sciences ,Text mining ,Databases, Protein ,Molecular Biology ,Extramural ,business.industry ,Cell Biology ,030104 developmental biology ,Post translational ,Protein processing ,Identification (biology) ,Peptides ,business ,Protein Processing, Post-Translational ,Biotechnology - Abstract
PIQED: automated identification and quantification of protein modifications from DIA-MS data
- Published
- 2017
41. ProHits-viz: a suite of web tools for visualizing interaction proteomics data
- Author
-
Hyungwon Choi, Alexey I. Nesvizhskii, Anne-Claude Gingras, Gagan D. Gupta, Brian Raught, James D.R. Knight, and Laurence Pelletier
- Subjects
Proteomics ,0301 basic medicine ,Internet ,Computer science ,Suite ,Cell Biology ,Bioinformatics ,Biochemistry ,Data science ,Article ,Mass Spectrometry ,03 medical and health sciences ,030104 developmental biology ,Protein Interaction Mapping ,Databases, Protein ,Molecular Biology ,Software ,Biotechnology - Published
- 2017
42. MSblender: A Probabilistic Approach for Integrating Peptide Identifications from Multiple Database Search Engines
- Author
-
Taejoon Kwon, Christine Vogel, Hyungwon Choi, Edward M. Marcotte, and Alexey I. Nesvizhskii
- Subjects
Proteomics ,Saccharomyces cerevisiae Proteins ,Posterior probability ,Saccharomyces cerevisiae ,Biology ,computer.software_genre ,Biochemistry ,Article ,Search engine ,Software ,Tandem Mass Spectrometry ,Escherichia coli ,Humans ,Database search engine ,Sensitivity (control systems) ,Databases, Protein ,Shotgun proteomics ,Probability ,Models, Statistical ,business.industry ,Escherichia coli Proteins ,Probabilistic logic ,General Chemistry ,Search Engine ,Research Design ,Protein identification ,Data mining ,Peptides ,business ,computer ,Algorithms - Abstract
Shotgun proteomics using mass spectrometry is a powerful method for protein identification but suffers limited sensitivity in complex samples. Integrating peptide identifications from multiple database search engines is a promising strategy to increase the number of peptide identifications and reduce the volume of unassigned tandem mass spectra. Existing methods pool statistical significance scores such as p-values or posterior probabilities of peptide-spectrum matches (PSMs) from multiple search engines after high scoring peptides have been assigned to spectra, but these methods lack reliable control of identification error rates as data are integrated from different search engines. We developed a statistically coherent method for integrative analysis, termed MSblender. MSblender converts raw search scores from search engines into a probability score for all possible PSMs and properly accounts for the correlation between search scores. The method reliably estimates false discovery rates and identifies more PSMs than any single search engine at the same false discovery rate. Increased identifications increment spectral counts for all detected proteins and allow quantification of proteins that would not have been quantified by individual search engines. We also demonstrate that enhanced quantification contributes to improve sensitivity in differential expression analyses.
- Published
- 2011
43. Global Analysis of Protein Palmitoylation in African Trypanosomes
- Author
-
Alexey I. Nesvizhskii, Igor C. Almeida, Christina Souther, Ernesto S. Nakayasu, Brian T. Emmer, Tiago J. P. Sobreira, David M. Engman, Hyungwon Choi, and Conrad L. Epting
- Subjects
Lipoylation ,Molecular Sequence Data ,Trypanosoma brucei brucei ,Protozoan Proteins ,Sequence alignment ,Trypanosoma brucei ,Microbiology ,Mass Spectrometry ,Palmitoylation ,Humans ,Transferase ,Protein palmitoylation ,Amino Acid Sequence ,Molecular Biology ,Peptide sequence ,biology ,technology, industry, and agriculture ,Articles ,General Medicine ,biology.organism_classification ,Cell biology ,Trypanosomiasis, African ,Biochemistry ,Acyltransferases ,Proteome ,lipids (amino acids, peptides, and proteins) ,Sequence Alignment - Abstract
Many eukaryotic proteins are posttranslationally modified by the esterification of cysteine thiols to long-chain fatty acids. This modification, protein palmitoylation, is catalyzed by a large family of palmitoyl acyltransferases that share an Asp-His-His-Cys Cys-rich domain but differ in their subcellular localizations and substrate specificities. In Trypanosoma brucei , the flagellated protozoan parasite that causes African sleeping sickness, protein palmitoylation has been observed for a few proteins, but the extent and consequences of this modification are largely unknown. We undertook the present study to investigate T. brucei protein palmitoylation at both the enzyme and substrate levels. Treatment of parasites with an inhibitor of total protein palmitoylation caused potent growth inhibition, yet there was no effect on growth by the separate, selective inhibition of each of the 12 individual T. brucei palmitoyl acyltransferases. This suggested either that T. brucei evolved functional redundancy for the palmitoylation of essential palmitoyl proteins or that palmitoylation of some proteins is catalyzed by a noncanonical transferase. To identify the palmitoylated proteins in T. brucei , we performed acyl biotin exchange chemistry on parasite lysates, followed by streptavidin chromatography, two-dimensional liquid chromatography-tandem mass spectrometry protein identification, and QSpec statistical analysis. A total of 124 palmitoylated proteins were identified, with an estimated false discovery rate of 1.0%. This palmitoyl proteome includes all of the known palmitoyl proteins in procyclic-stage T. brucei as well as several proteins whose homologues are palmitoylated in other organisms. Their sequences demonstrate the variety of substrate motifs that support palmitoylation, and their identities illustrate the range of cellular processes affected by palmitoylation in these important pathogens.
- Published
- 2011
44. Label-free quantitative proteomics and SAINT analysis enable interactome mapping for the human Ser/Thr protein phosphatase 5
- Author
-
Danalea V. Skarra, Richard E. Honkanen, Marilyn Goudreault, Michael Mullin, Hyungwon Choi, Alexey I. Nesvizhskii, and Anne-Claude Gingras
- Subjects
Proteomics ,Chaperonins ,Quantitative proteomics ,Cell Cycle Proteins ,Biology ,Biochemistry ,Interactome ,Mass Spectrometry ,Article ,Cell Line ,Chaperonin ,Protein–protein interaction ,Protein Interaction Mapping ,Phosphoprotein Phosphatases ,Humans ,HSP90 Heat-Shock Proteins ,Molecular Biology ,Heat-Shock Proteins ,Adaptor Proteins, Signal Transducing ,Nuclear Proteins ,Signal transducing adaptor protein ,Tetratricopeptide ,Phosphoprotein ,Mutation ,Protein Binding - Abstract
Affinity purification coupled to mass spectrometry (AP-MS) represents a powerful and proven approach for the analysis of protein-protein interactions. However, the detection of true interactions for proteins that are commonly considered background contaminants is currently a limitation of AP-MS. Here using spectral counts and the new statistical tool, Significance Analysis of INTeractome (SAINT), true interaction between the serine/threonine protein phosphatase 5 (PP5) and a chaperonin, heat shock protein 90 (Hsp90), is discerned. Furthermore, we report and validate a new interaction between PP5 and an Hsp90 adaptor protein, stress-induced phosphoprotein 1 (STIP1; HOP). Mutation of PP5, replacing key basic amino acids (K97A and R101A) in the tetratricopeptide repeat (TPR) region known to be necessary for the interactions with Hsp90, abolished both the known interaction of PP5 with cell division cycle 37 homolog and the novel interaction of PP5 with stress-induced phosphoprotein 1. Taken together, the results presented demonstrate the usefulness of label-free quantitative proteomics and statistical tools to discriminate between noise and true interactions, even for proteins normally considered as background contaminants.
- Published
- 2011
45. Abacus: A computational tool for extracting and pre-processing spectral count data for label-free quantitative proteomic analysis
- Author
-
Anastasia K. Yocum, Alexey I. Nesvizhskii, Damian Fermin, and Venkatesha Basrur
- Subjects
Proteomics ,Normalization (statistics) ,Computer science ,Bioinformatics ,Biochemistry ,Article ,Software ,Abacus (architecture) ,Sequence Analysis, Protein ,Tandem Mass Spectrometry ,Humans ,Databases, Protein ,Molecular Biology ,Graphical user interface ,Automation, Laboratory ,business.industry ,Suite ,Computational Biology ,Proteins ,Pattern recognition ,Automation ,Pipeline (software) ,ComputingMethodologies_PATTERNRECOGNITION ,Artificial intelligence ,business ,Count data - Abstract
We describe Abacus, a computational tool for extracting spectral counts from tandem mass spectrometry based proteomic datasets. The program aggregates data from multiple experiments, adjusts spectral counts to accurately account for peptides shared across multiple proteins, and performs common normalization steps. It can also output the spectral count data at the gene level, thus simplifying the integration and comparison between gene and protein expression data. Abacus is compatible with the widely used Trans-Proteomic Pipeline suite of tools and comes with a graphical user interface making it easy to interact with the program. The main aim of Abacus is to streamline the analysis of spectral count data by providing an automated, easy to use solution for extracting this information from proteomic datasets for subsequent, more sophisticated statistical analysis.
- Published
- 2011
46. LuciPHOr2: site localization of generic post-translational modifications from tandem mass spectrometry data
- Author
-
Damian Fermin, Hyungwon Choi, Alexey I. Nesvizhskii, and Dmitry M. Avtonomov
- Subjects
Statistics and Probability ,Phosphorylation sites ,Computer science ,Muscle Proteins ,Tandem mass spectrometry ,computer.software_genre ,Biochemistry ,Sequence Analysis, Protein ,Tandem Mass Spectrometry ,Humans ,Phosphorylation ,Muscle, Skeletal ,Molecular Biology ,Lysine ,Acetylation ,Applications Notes ,Peptide Fragments ,Computer Science Applications ,Computational Mathematics ,A-site ,Computational Theory and Mathematics ,Posttranslational modification ,Data mining ,Threading (protein sequence) ,Protein Processing, Post-Translational ,computer ,Algorithms ,Software - Abstract
We present LuciPHOr2, a site localization tool for generic post-translational modifications (PTMs) using tandem mass spectrometry data. As an extension of the original LuciPHOr (version 1) for phosphorylation site localization, the new software provides a site-level localization score for generic PTMs and associated false discovery rate called the false localization rate. We describe several novel features such as operating system independence and reduced computation time through multiple threading. We also discuss optimal parameters for different types of data and illustrate the new tool on a human skeletal muscle dataset for lysine-acetylation. Availability and implementation: The software is freely available on the SourceForge website http://luciphor2.sourceforge.net. Contact: hyung_won_choi@nuhs.edu.sg, nesvi@med.umich.edu Supplementary information: Supplementary data are available at Bioinformatics online.
- Published
- 2014
47. SAINT: probabilistic scoring of affinity purification–mass spectrometry data
- Author
-
Damian Fermin, Mike Tyers, Ashton Breitkreutz, Zhen Yuan Lin, Hyungwon Choi, Anne-Claude Gingras, Alexey I. Nesvizhskii, Dattatreya Mellacheruvu, Zhaohui S. Qin, and Brett Larsen
- Subjects
Computer science ,Bioinformatics ,Mass spectrometry ,Proteomics ,Biochemistry ,Interactome ,Article ,Chromatography, Affinity ,Mass Spectrometry ,03 medical and health sciences ,Affinity chromatography ,Protein Interaction Mapping ,Computer Simulation ,Molecular Biology ,Probability ,030304 developmental biology ,0303 health sciences ,business.industry ,Extramural ,030302 biochemistry & molecular biology ,Probabilistic logic ,Computational Biology ,Proteins ,Pattern recognition ,Cell Biology ,Proteins metabolism ,Artificial intelligence ,business ,Protein Binding ,Biotechnology - Abstract
We present SAINT (Significance Analysis of INTeractome), a computational tool that assigns confidence scores to protein-protein interaction data generated using affinity-purification coupled to mass spectrometry (AP-MS). The method utilizes label-free quantitative data and constructs separate distributions for true and false interactions to derive the probability of a bona fide protein-protein interaction. We demonstrate that SAINT is applicable to data of different scales and protein connectivity and allows for the transparent analysis of AP-MS data.
- Published
- 2010
48. A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics
- Author
-
Alexey I. Nesvizhskii
- Subjects
Proteomics ,Chemistry ,Posterior probability ,Biophysics ,Computational Biology ,Proteins ,Word error rate ,Inference ,Statistical model ,Tandem mass spectrometry ,computer.software_genre ,Bioinformatics ,Biochemistry ,Mass Spectrometry ,Article ,Identification (information) ,False Positive Reactions ,Data mining ,Databases, Protein ,Peptides ,Shotgun proteomics ,computer - Abstract
This manuscript provides a comprehensive review of the peptide and protein identification process using tandem mass spectrometry (MS/MS) data generated in shotgun proteomic experiments. The commonly used methods for assigning peptide sequences to MS/MS spectra are critically discussed and compared, from basic strategies to advanced multi-stage approaches. A particular attention is paid to the problem of false-positive identifications. Existing statistical approaches for assessing the significance of peptide to spectrum matches are surveyed, ranging from single-spectrum approaches such as expectation values to global error rate estimation procedures such as false discovery rates and posterior probabilities. The importance of using auxiliary discriminant information (mass accuracy, peptide separation coordinates, digestion properties, and etc.) is discussed, and advanced computational approaches for joint modeling of multiple sources of information are presented. This review also includes a detailed analysis of the issues affecting the interpretation of data at the protein level, including the amplification of error rates when going from peptide to protein level, and the ambiguities in inferring the identifies of sample proteins in the presence of shared peptides. Commonly used methods for computing protein-level confidence scores are discussed in detail. The review concludes with a discussion of several outstanding computational issues.
- Published
- 2010
49. Computational analysis of unassigned high-quality MS/MS spectra in proteomic data sets
- Author
-
Kang Ning, Damian Fermin, and Alexey I. Nesvizhskii
- Subjects
Proteomics ,Multiple stages ,Ms ms spectra ,business.industry ,Computational Biology ,Pattern recognition ,Shotgun ,Biology ,Biochemistry ,Data science ,Genomic databases ,Article ,Mass Spectrometry ,Data set ,ComputingMethodologies_PATTERNRECOGNITION ,Peptide spectral library ,Cell Line, Tumor ,Humans ,Artificial intelligence ,Computational analysis ,Databases, Protein ,Peptides ,business ,Shotgun proteomics ,Molecular Biology - Abstract
In a typical shotgun proteomics experiment, a significant number of high-quality MS/MS spectra remain "unassigned." The main focus of this work is to improve our understanding of various sources of unassigned high-quality spectra. To achieve this, we designed an iterative computational approach for more efficient interrogation of MS/MS data. The method involves multiple stages of database searching with different search parameters, spectral library searching, blind searching for modified peptides, and genomic database searching. The method is applied to a large publicly available shotgun proteomic data set.
- Published
- 2010
50. A guided tour of the Trans-Proteomic Pipeline
- Author
-
Bryan J. Prazen, Natalie Tasman, Zhi Sun, Jimmy K. Eng, David Shteynberg, Alexey I. Nesvizhskii, Luis Mendoza, Ruedi Aebersold, Henry H N Lam, Erik Nilsson, Daniel Martin, Brian S. Pratt, Terry Farrah, and Eric W. Deutsch
- Subjects
Proteomics ,Materials science ,business.industry ,Suite ,Integrated software ,Trans-Proteomic Pipeline ,Computational Biology ,Information Storage and Retrieval ,Biochemistry ,Pipeline (software) ,Combinatorial chemistry ,Article ,Data set ,ComputingMethodologies_PATTERNRECOGNITION ,Workflow ,Software ,Sequence Analysis, Protein ,Tandem Mass Spectrometry ,Isotope Labeling ,PeptideAtlas ,Databases, Protein ,Software engineering ,business ,Molecular Biology - Abstract
The Trans-Proteomic Pipeline (TPP) is a suite of software tools for the analysis of tandem mass spectrometry datasets. The tools encompass most of the steps in a proteomic data analysis workflow in a single, integrated software system. Specifically, the TPP supports all steps from spectrometer output file conversion to protein-level statistical validation, including quantification by stable isotope ratios. We describe here the full workflow of the TPP and the tools therein, along with an example on a sample dataset, demonstrating that the set up and use of the tools is straightforward and well supported and does not require specialized informatics resources or knowledge.
- Published
- 2010
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.