146 results on '"protein subcellular location"'
Search Results
2. Learning protein subcellular localization multi-view patterns from heterogeneous data of imaging, sequence and networks.
- Author
-
Wang, Ge, Xue, Min-Qi, Shen, Hong-Bin, and Xu, Ying-Ying
- Subjects
- *
AMINO acid sequence , *PROTEINS , *DEEP learning , *PROTEIN expression , *PROTEIN-protein interactions - Abstract
Location proteomics seeks to provide automated high-resolution descriptions of protein location patterns within cells. Many efforts have been undertaken in location proteomics over the past decades, thereby producing plenty of automated predictors for protein subcellular localization. However, most of these predictors are trained solely from high-throughput microscopic images or protein amino acid sequences alone. Unifying heterogeneous protein data sources has yet to be exploited. In this paper, we present a pipeline called sequence, image, network-based protein subcellular locator (SIN-Locator) that constructs a multi-view description of proteins by integrating multiple data types including images of protein expression in cells or tissues, amino acid sequences and protein–protein interaction networks, to classify the patterns of protein subcellular locations. Proteins were encoded by both handcrafted features and deep learning features, and multiple combining methods were implemented. Our experimental results indicated that optimal integrations can considerately enhance the classification accuracy, and the utility of SIN-Locator has been demonstrated through applying to new released proteins in the human protein atlas. Furthermore, we also investigate the contribution of different data sources and influence of partial absence of data. This work is anticipated to provide clues for reconciliation and combination of multi-source data for protein location analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
3. Automated classification of protein subcellular localization in immunohistochemistry images to reveal biomarkers in colon cancer
- Author
-
Zhen-Zhen Xue, Yanxia Wu, Qing-Zu Gao, Liang Zhao, and Ying-Ying Xu
- Subjects
Bioimage processing ,Bioinformatics ,Machine learning ,Protein subcellular location ,Cancer biomarkers ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Protein biomarkers play important roles in cancer diagnosis. Many efforts have been made on measuring abnormal expression intensity in biological samples to identity cancer types and stages. However, the change of subcellular location of proteins, which is also critical for understanding and detecting diseases, has been rarely studied. Results In this work, we developed a machine learning model to classify protein subcellular locations based on immunohistochemistry images of human colon tissues, and validated the ability of the model to detect subcellular location changes of biomarker proteins related to colon cancer. The model uses representative image patches as inputs, and integrates feature engineering and deep learning methods. It achieves 92.69% accuracy in classification of new proteins. Two validation datasets of colon cancer biomarkers derived from published literatures and the human protein atlas database respectively are employed. It turns out that 81.82 and 65.66% of the biomarker proteins can be identified to change locations. Conclusions Our results demonstrate that using image patches and combining predefined and deep features can improve the performance of protein subcellular localization, and our model can effectively detect biomarkers based on protein subcellular translocations. This study is anticipated to be useful in annotating unknown subcellular localization for proteins and discovering new potential location biomarkers.
- Published
- 2020
- Full Text
- View/download PDF
4. Predicting Human Protein Subcellular Locations by Using a Combination of Network and Function Features.
- Author
-
Chen, Lei, Li, ZhanDong, Zeng, Tao, Zhang, Yu-Hang, Zhang, ShiQi, Huang, Tao, and Cai, Yu-Dong
- Subjects
FEATURE selection ,PROTEIN-protein interactions ,MACHINE learning ,PROTEINS ,CLASSIFICATION algorithms ,POLYMER networks - Abstract
Given the limitation of technologies, the subcellular localizations of proteins are difficult to identify. Predicting the subcellular localization and the intercellular distribution patterns of proteins in accordance with their specific biological roles, including validated functions, relationships with other proteins, and even their specific sequence characteristics, is necessary. The computational prediction of protein subcellular localizations can be performed on the basis of the sequence and the functional characteristics. In this study, the protein–protein interaction network, functional annotation of proteins and a group of direct proteins with known subcellular localization were used to construct models. To build efficient models, several powerful machine learning algorithms, including two feature selection methods, four classification algorithms, were employed. Some key proteins and functional terms were discovered, which may provide important contributions for determining protein subcellular locations. Furthermore, some quantitative rules were established to identify the potential subcellular localizations of proteins. As the first prediction model that uses direct protein annotation information (i.e., functional features) and STRING-based protein–protein interaction network (i.e., network features), our computational model can help promote the development of predictive technologies on subcellular localizations and provide a new approach for exploring the protein subcellular localization patterns and their potential biological importance. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
5. PScL-HDeep: image-based prediction of protein subcellular location in human tissue using ensemble learning of handcrafted and deep learned features with two-layer feature selection.
- Author
-
Ullah, Matee, Han, Ke, Hadi, Fazal, Xu, Jian, Song, Jiangning, and Yu, Dong-Jun
- Subjects
- *
FEATURE selection , *DEEP learning , *RADIAL basis functions , *PROTEOMICS , *SUPPORT vector machines , *DISCRIMINANT analysis - Abstract
Protein subcellular localization plays a crucial role in characterizing the function of proteins and understanding various cellular processes. Therefore, accurate identification of protein subcellular location is an important yet challenging task. Numerous computational methods have been proposed to predict the subcellular location of proteins. However, most existing methods have limited capability in terms of the overall accuracy, time consumption and generalization power. To address these problems, in this study, we developed a novel computational approach based on human protein atlas (HPA) data, referred to as PScL-HDeep, for accurate and efficient image-based prediction of protein subcellular location in human tissues. We extracted different handcrafted and deep learned (by employing pretrained deep learning model) features from different viewpoints of the image. The step-wise discriminant analysis (SDA) algorithm was applied to generate the optimal feature set from each original raw feature set. To further obtain a more informative feature subset, support vector machine–based recursive feature elimination with correlation bias reduction (SVM-RFE + CBR) feature selection algorithm was applied to the integrated feature set. Finally, the classification models, namely support vector machine with radial basis function (SVM-RBF) and support vector machine with linear kernel (SVM-LNR), were learned on the final selected feature set. To evaluate the performance of the proposed method, a new gold standard benchmark training dataset was constructed from the HPA databank. PScL-HDeep achieved the maximum performance on 10-fold cross validation test on this dataset and showed a better efficacy over existing predictors. Furthermore, we also illustrated the generalization ability of the proposed method by conducting a stringent independent validation test. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
6. Predicting Human Protein Subcellular Locations by Using a Combination of Network and Function Features
- Author
-
Lei Chen, ZhanDong Li, Tao Zeng, Yu-Hang Zhang, ShiQi Zhang, Tao Huang, and Yu-Dong Cai
- Subjects
protein subcellular location ,protein-protein interaction network ,GO enrichment ,KEGG enrichment ,feature selection ,classification algorithm ,Genetics ,QH426-470 - Abstract
Given the limitation of technologies, the subcellular localizations of proteins are difficult to identify. Predicting the subcellular localization and the intercellular distribution patterns of proteins in accordance with their specific biological roles, including validated functions, relationships with other proteins, and even their specific sequence characteristics, is necessary. The computational prediction of protein subcellular localizations can be performed on the basis of the sequence and the functional characteristics. In this study, the protein–protein interaction network, functional annotation of proteins and a group of direct proteins with known subcellular localization were used to construct models. To build efficient models, several powerful machine learning algorithms, including two feature selection methods, four classification algorithms, were employed. Some key proteins and functional terms were discovered, which may provide important contributions for determining protein subcellular locations. Furthermore, some quantitative rules were established to identify the potential subcellular localizations of proteins. As the first prediction model that uses direct protein annotation information (i.e., functional features) and STRING-based protein–protein interaction network (i.e., network features), our computational model can help promote the development of predictive technologies on subcellular localizations and provide a new approach for exploring the protein subcellular localization patterns and their potential biological importance.
- Published
- 2021
- Full Text
- View/download PDF
7. Protein subcellular localization based on deep image features and criterion learning strategy.
- Author
-
Su, Ran, He, Linlin, Liu, Tianling, Liu, Xiaofeng, and Wei, Leyi
- Subjects
- *
LEARNING strategies , *CONVOLUTIONAL neural networks , *IMMOBILIZED proteins , *HUMAN biology , *PROTEINS - Abstract
The spatial distribution of proteome at subcellular levels provides clues for protein functions, thus is important to human biology and medicine. Imaging-based methods are one of the most important approaches for predicting protein subcellular location. Although deep neural networks have shown impressive performance in a number of imaging tasks, its application to protein subcellular localization has not been sufficiently explored. In this study, we developed a deep imaging-based approach to localize the proteins at subcellular levels. Based on deep image features extracted from convolutional neural networks (CNNs), both single-label and multi-label locations can be accurately predicted. Particularly, the multi-label prediction is quite a challenging task. Here we developed a criterion learning strategy to exploit the label–attribute relevancy and label–label relevancy. A criterion that was used to determine the final label set was automatically obtained during the learning procedure. We concluded an optimal CNN architecture that could give the best results. Besides, experiments show that compared with the hand-crafted features, the deep features present more accurate prediction with less features. The implementation for the proposed method is available at https://github.com/RanSuLab/ProteinSubcellularLocation. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
8. Consistency and variation of protein subcellular location annotations.
- Author
-
Xu, Ying‐Ying, Zhou, Hang, Murphy, Robert F., and Shen, Hong‐Bin
- Abstract
A major challenge for protein databases is reconciling information from diverse sources. This is especially difficult when some information consists of secondary, human‐interpreted rather than primary data. For example, the Swiss‐Prot database contains curated annotations of subcellular location that are based on predictions from protein sequence, statements in scientific articles, and published experimental evidence. The Human Protein Atlas (HPA) consists of millions of high‐resolution microscopic images that show protein spatial distribution on a cellular and subcellular level. These images are manually annotated with protein subcellular locations by trained experts. The image annotations in HPA can capture the variation of subcellular location across different cell lines, tissues, or tissue states. Systematic investigation of the consistency between HPA and Swiss‐Prot assignments of subcellular location, which is important for understanding and utilizing protein location data from the two databases, has not been described previously. In this paper, we quantitatively evaluate the consistency of subcellular location annotations between HPA and Swiss‐Prot at multiple levels, as well as variation of protein locations across cell lines and tissues. Our results show that annotations of these two databases differ significantly in many cases, leading to proposed procedures for deriving and integrating the protein subcellular location data. We also find that proteins having highly variable locations are more likely to be biomarkers of diseases, providing support for incorporating analysis of subcellular location in protein biomarker identification and screening. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
9. Bioimage-Based Prediction of Protein Subcellular Location in Human Tissue with Ensemble Features and Deep Networks.
- Author
-
Liu, Guang-Hui, Zhang, Bei-Wei, Qian, Gang, Wang, Bin, Mao, Bo, and Bichindaritz, Isabelle
- Abstract
Prediction of protein subcellular location has currently become a hot topic because it has been proven to be useful for understanding both the disease mechanisms and novel drug design. With the rapid development of automated microscopic imaging technology in recent years, classification methods of bioimage-based protein subcellular location have attracted considerable attention for images can describe the protein distribution intuitively and in detail. In the current study, a prediction method of protein subcellular location was proposed based on multi-view image features that are extracted from three different views, including the four texture features of the original image, the global and local features of the protein extracted from the protein channel images after color segmentation, and the global features of DNA extracted from the DNA channel image. Finally, the extracted features were combined together to improve the performance of subcellular localization prediction. From the performance comparison of different combination features under the same classifier, the best ensemble features could be obtained. In this work, a classifier based on Stacked Auto-encoders and the random forest was also put forward. To improve the prediction results, the deep network was combined with the traditional statistical classification methods. Stringent cross-validation and independent validation tests on the benchmark dataset demonstrated the efficacy of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
10. Automated classification of protein subcellular localization in immunohistochemistry images to reveal biomarkers in colon cancer.
- Author
-
Xue, Zhen-Zhen, Wu, Yanxia, Gao, Qing-Zu, Zhao, Liang, and Xu, Ying-Ying
- Subjects
- *
COLON cancer , *HISTOCHEMISTRY , *BIOLOGICAL tags , *IDENTITY (Psychology) , *TUMOR classification , *PROTEINS , *COLON (Anatomy) - Abstract
Background: Protein biomarkers play important roles in cancer diagnosis. Many efforts have been made on measuring abnormal expression intensity in biological samples to identity cancer types and stages. However, the change of subcellular location of proteins, which is also critical for understanding and detecting diseases, has been rarely studied. Results: In this work, we developed a machine learning model to classify protein subcellular locations based on immunohistochemistry images of human colon tissues, and validated the ability of the model to detect subcellular location changes of biomarker proteins related to colon cancer. The model uses representative image patches as inputs, and integrates feature engineering and deep learning methods. It achieves 92.69% accuracy in classification of new proteins. Two validation datasets of colon cancer biomarkers derived from published literatures and the human protein atlas database respectively are employed. It turns out that 81.82 and 65.66% of the biomarker proteins can be identified to change locations. Conclusions: Our results demonstrate that using image patches and combining predefined and deep features can improve the performance of protein subcellular localization, and our model can effectively detect biomarkers based on protein subcellular translocations. This study is anticipated to be useful in annotating unknown subcellular localization for proteins and discovering new potential location biomarkers. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
11. Predicting Sub-cellular Location of Proteins Based on Hierarchical Clustering and Hidden Markov Models
- Author
-
Jaramillo-Garzón, Jorge Alberto, Castro-Ceballos, Jacobo, Castellanos-Dominguez, Germán, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Kobsa, Alfred, Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Istrail, Sorin, Series editor, Pevzner, Pavel, Series editor, Waterman, Michael S., Series editor, Ortuño, Francisco, editor, and Rojas, Ignacio, editor
- Published
- 2015
- Full Text
- View/download PDF
12. Identifying Protein Subcellular Location with Embedding Features Learned from Networks
- Author
-
Liu Hongwei, Lu Lin, Hu Bin, and Chen Lei
- Subjects
Protein subcellular location ,Embedding ,Computational biology ,Biology ,Molecular Biology ,Biochemistry - Abstract
Background: Identification of protein subcellular location is an important problem because the subcellular location is highly related to protein function. It is fundamental to determine the locations with biology experiments. However, these experiments are of high costs and time-consuming. The alternative way to address such a problem is to design effective computational methods. Objective: To date, several computational methods have been proposed in this regard. However, these methods mainly adopted the features derived from the proteins themselves. On the other hand, with the development of the network technique, several embedding algorithms have been proposed, which can encode nodes in the network into feature vectors. Such algorithms connected the network and traditional classification algorithms. Thus, they provided a new way to construct models for the prediction of protein subcellular location. Methods: In this study, we analyzed features produced by three network embedding algorithms (DeepWalk, Node2vec and Mashup) that were applied on one or multiple protein networks. Obtained features were learned by one machine learning algorithm (support vector machine or random forest) to construct the model. The cross-validation method was adopted to evaluate all constructed models. Results: After evaluating models with the cross-validation method, embedding features yielded by Mashup on multiple networks were quite informative for predicting protein subcellular location. The model based on these features were superior to some classic models. Conclusion: Embedding features yielded by a proper and powerful network embedding algorithm were effective for building the model for prediction of protein subcellular location, providing new pipelines to build more efficient models.
- Published
- 2021
- Full Text
- View/download PDF
13. Principles of Bioimage Informatics: Focus on Machine Learning of Cell Patterns
- Author
-
Coelho, Luis Pedro, Glory-Afshar, Estelle, Kangas, Joshua, Quinn, Shannon, Shariff, Aabid, Murphy, Robert F., Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Istrail, Sorin, editor, Pevzner, Pavel, editor, Waterman, Michael S., editor, Blaschke, Christian, editor, and Shatkay, Hagit, editor
- Published
- 2010
- Full Text
- View/download PDF
14. Automated, Systematic Determination of Protein Subcellular Location using Fluorescence Microscopy
- Author
-
Osuna, Elvira García, Murphy, Robert F., Harris, J. Robin, editor, Biswas, B.B., editor, Quinn, P., editor, Bertrand, Eric, editor, and Faupel, Michel, editor
- Published
- 2007
- Full Text
- View/download PDF
15. Extracting Features from Gene Ontology for the Identification of Protein Subcellular Location by Semantic Similarity Measurement
- Author
-
Li, Guoqi, Sheng, Huanye, Carbonell, Jaime G., editor, Siekmann, J\'org, editor, Washio, Takashi, editor, Zhou, Zhi-Hua, editor, Huang, Joshua Zhexue, editor, Hu, Xiaohua, editor, Li, Jinyan, editor, Xie, Chao, editor, He, Jieyue, editor, Zou, Deqing, editor, Li, Kuan-Ching, editor, and Freire, Mário M., editor
- Published
- 2007
- Full Text
- View/download PDF
16. Location Proteomics
- Author
-
Zhao, Ting, Chen, Shann-Ching, Murphy, Robert F., and Choi, Sangdun, editor
- Published
- 2007
- Full Text
- View/download PDF
17. Protein Subcellular Location Prediction Based on Pseudo Amino Acid Composition and Immune Genetic Algorithm
- Author
-
Zhang, Tongliang, Ding, Yongsheng, Shao, Shihuang, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Istrail, Sorin, editor, Pevzner, Pavel, editor, Waterman, Michael S., editor, Huang, De-Shuang, editor, Li, Kang, editor, and Irwin, George William, editor
- Published
- 2006
- Full Text
- View/download PDF
18. Prediction of Protein Subcellular Locations Using Support Vector Machines
- Author
-
Li, Na-na, Niu, Xiao-hui, Shi, Feng, Li, Xue-yan, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Dough, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Wang, Lipo, editor, Chen, Ke, editor, and Ong, Yew Soon, editor
- Published
- 2005
- Full Text
- View/download PDF
19. Active machine learning-driven experimentation to determine compound effects on protein patterns
- Author
-
Armaghan W Naik, Joshua D Kangas, Devin P Sullivan, and Robert F Murphy
- Subjects
active learning ,protein subcellular location ,laboratory automation ,high content screening ,automation of research ,machine learning ,Medicine ,Science ,Biology (General) ,QH301-705.5 - Abstract
High throughput screening determines the effects of many conditions on a given biological target. Currently, to estimate the effects of those conditions on other targets requires either strong modeling assumptions (e.g. similarities among targets) or separate screens. Ideally, data-driven experimentation could be used to learn accurate models for many conditions and targets without doing all possible experiments. We have previously described an active machine learning algorithm that can iteratively choose small sets of experiments to learn models of multiple effects. We now show that, with no prior knowledge and with liquid handling robotics and automated microscopy under its control, this learner accurately learned the effects of 48 chemical compounds on the subcellular localization of 48 proteins while performing only 29% of all possible experiments. The results represent the first practical demonstration of the utility of active learning-driven biological experimentation in which the set of possible phenotypes is unknown in advance.
- Published
- 2016
- Full Text
- View/download PDF
20. Consistency and variation of protein subcellular location annotations
- Author
-
Robert F. Murphy, Hong-Bin Shen, Hang Zhou, and Ying-Ying Xu
- Subjects
Biomarker identification ,Human Protein Atlas ,Computational biology ,Variation (game tree) ,Biology ,Biochemistry ,Article ,Cell Line ,Protein location ,Variable locations ,03 medical and health sciences ,Consistency (database systems) ,Atlases as Topic ,Protein sequencing ,Structural Biology ,Protein subcellular location ,Humans ,Databases, Protein ,Molecular Biology ,030304 developmental biology ,Observer Variation ,0303 health sciences ,030302 biochemistry & molecular biology ,Uncertainty ,Proteins ,Reproducibility of Results ,Molecular Sequence Annotation ,Cell Compartmentation ,Eukaryotic Cells - Abstract
A major challenge for protein databases is reconciling information from diverse sources. This is especially difficult when some information consists of secondary, human-interpreted rather than primary data. For example, the Swiss-Prot database contains curated annotations of subcellular location that are based on predictions from protein sequence, statements in scientific articles, and published experimental evidence. The Human Protein Atlas (HPA) consists of millions of high-resolution microscopic images that show protein spatial distribution on a cellular and subcellular level. These images are manually annotated with protein subcellular locations by trained experts. The image annotations in HPA can capture the variation of subcellular location across different cell lines, tissues, or tissue states. Systematic investigation of the consistency between HPA and Swiss-Prot assignments of subcellular location, which is important for understanding and utilizing protein location data from the two databases, has not been described previously. In this paper, we quantitatively evaluate the consistency of subcellular location annotations between HPA and Swiss-Prot at multiple levels, as well as variation of protein locations across cell lines and tissues. Our results show that annotations of these two databases differ significantly in many cases, leading to proposed procedures for deriving and integrating the protein subcellular location data. We also find that proteins having highly variable locations are more likely to be biomarkers of diseases, providing support for incorporating analysis of subcellular location in protein biomarker identification and screening.
- Published
- 2020
- Full Text
- View/download PDF
21. SCLpred-EMS: subcellular localization prediction of endomembrane system and secretory pathway proteins by Deep N-to-1 Convolutional Neural Networks
- Author
-
Jeremy C. Simpson, Gianluca Pollastri, Manaz Kaleel, Xuanming Feng, Yandan Zheng, Catherine Mooney, and Jialiang Chen
- Subjects
Statistics and Probability ,Computer science ,Computational biology ,Secretory pathway ,Biochemistry ,Convolutional neural network ,Machine Learning ,03 medical and health sciences ,Protein subcellular location ,Machine learning ,Protein function prediction ,Endomembrane system ,Molecular Biology ,030304 developmental biology ,0303 health sciences ,Secretory Pathway ,Artificial neural network ,030302 biochemistry & molecular biology ,Computational Biology ,Proteins ,A protein ,Subcellular localization ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,Neural Networks, Computer ,Algorithms ,Neural networks - Abstract
Motivation The subcellular location of a protein can provide useful information for protein function prediction and drug design. Experimentally determining the subcellular location of a protein is an expensive and time-consuming task. Therefore, various computer-based tools have been developed, mostly using machine learning algorithms, to predict the subcellular location of proteins. Results Here, we present a neural network-based algorithm for protein subcellular location prediction. We introduce SCLpred-EMS a subcellular localization predictor powered by an ensemble of Deep N-to-1 Convolutional Neural Networks. SCLpred-EMS predicts the subcellular location of a protein into two classes, the endomembrane system and secretory pathway versus all others, with a Matthews correlation coefficient of 0.75–0.86 outperforming the other state-of-the-art web servers we tested. Availability and implementation SCLpred-EMS is freely available for academic users at http://distilldeep.ucd.ie/SCLpred2/. Contact catherine.mooney@ucd.ie
- Published
- 2020
- Full Text
- View/download PDF
22. Immunogenic potential of neopeptides depends on parent protein subcellular location
- Author
-
Maurizio Zanetti, Andrea Castro, Hannah Carter, William H. Hildebrand, and Saghar Kaabinejadian
- Subjects
chemistry.chemical_classification ,biology ,medicine.medical_treatment ,Immunogenicity ,Antigen presentation ,Peptide ,Immunotherapy ,Computational biology ,Major histocompatibility complex ,Immune checkpoint ,chemistry ,Protein subcellular location ,Immunity ,medicine ,biology.protein - Abstract
Antigen presentation via the major histocompatibility complex (MHC) is essential for anti-tumor immunity, however the rules that determine what tumor-derived peptides will be immunogenic are still incompletely understood. Here we investigate whether protein subcellular location driven constraints on accessibility of peptides to the MHC associate with potential for peptide immunogenicity. Analyzing over 380,000 peptides from studies of MHC presentation and peptide immunogenicity, we find clear spatial biases in both eluted and immunogenic peptides. We find that including parent protein location improves prediction of peptide immunogenicity in multiple datasets. In human immunotherapy cohorts, location was associated with response to a neoantigen vaccine, and immune checkpoint blockade responders generally had a higher burden of neopeptides from accessible locations. We conclude that protein subcellular location adds important information for optimizing immunotherapies.HighlightsPeptides eluted from class I and II MHC reflect biases in the subcellular location of the parent proteinsAn embedding-based indicator of parent protein location improves prediction of neoepitope immunogenicity and immunotherapy responseNeoepitope location improves estimation of effective neoantigen burden and stratification of potential for immunotherapy response
- Published
- 2021
- Full Text
- View/download PDF
23. Proposing Pseudo Amino Acid Components is an Important Milestone for Proteome and Genome Analyses
- Author
-
Kuo-Chen Chou
- Subjects
chemistry.chemical_classification ,010405 organic chemistry ,Pharmacology toxicology ,Bioengineering ,Structural classification ,Computational biology ,Biology ,01 natural sciences ,Biochemistry ,Genome ,0104 chemical sciences ,Analytical Chemistry ,Amino acid ,chemistry ,Protein subcellular location ,Drug Discovery ,Proteome ,Milestone (project management) ,Molecular Medicine - Abstract
In this minireview paper it has been elucidated that the proposal of pseudo amino acid components represents a very important milestone for the disciplines of proteome and genome. This has been concluded by observing and analyzing the developments in the following six different sub-disciplines: (1) proteome analysis; (2) genome analysis; (3) protein structural classification; (4) protein subcellular location prediction; (5) post-translational modification (PTM) site prediction; (6) stimulating the birth of the renowned and very powerful 5-steps rule.
- Published
- 2019
- Full Text
- View/download PDF
24. Approach the Answer Step by Step–Application of Active Learning in Protein Subcellular Location Patterns
- Author
-
Shi Deng
- Subjects
Computer science ,business.industry ,Active learning (machine learning) ,Supervised learning ,Experimental data ,Machine learning ,computer.software_genre ,Protein subcellular location ,Line (geometry) ,Material resources ,Artificial intelligence ,business ,computer ,Predictive modelling - Abstract
When it comes to biological experiments, especially the experiments related to protein and drug binding, thorough experiments are not feasible because they will cost a lot of manpower and material resources [1]. Therefore, it has become a popular method to select a series of experiments to be carried out and effectively learns the model to predict the results of unfinished experiments, that is, active learning. Based on the existing experimental data, this paper discusses the feasibility of machine learning in biological experiments. Ordinary supervised learning and active learning are used to build prediction models, respectively. The difference between them is that active learning purposefully selects the "most useful" data for multiple rounds of learning, which is more in line with the needs of actual experiments. The result is that the accuracy of active learning is slightly higher than that of ordinary supervised learning when almost one fifth of the data of ordinary supervised learning is used.
- Published
- 2021
- Full Text
- View/download PDF
25. Text as data: Using text-based features for proteins representation and for computational prediction of their characteristics.
- Author
-
Shatkay, Hagit, Brady, Scott, and Wong, Andrew
- Subjects
- *
NUCLEOTIDE sequencing , *PREDICTION models , *TEXT mining , *BIOINFORMATICS , *MACHINE learning - Abstract
The current era of large-scale biology is characterized by a fast-paced growth in the number of sequenced genomes and, consequently, by a multitude of identified proteins whose function has yet to be determined. Simultaneously, any known or postulated information concerning genes and proteins is part of the ever-growing published scientific literature, which is expanding at a rate of over a million new publications per year. Computational tools that attempt to automatically predict and annotate protein characteristics, such as function and localization patterns, are being developed along with systems that aim to support the process via text mining. Most work on protein characterization focuses on features derived directly from protein sequence data. Protein-related work that does aim to utilize the literature typically concentrates on extracting specific facts (e.g., protein interactions) from text. In the past few years we have taken a different route, treating the literature as a source of text-based features, which can be employed just as sequence-based protein-features were used in earlier work, for predicting protein subcellular location and possibly also function. We discuss here in detail the overall approach, along with results from work we have done in this area demonstrating the value of this method and its potential use. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
26. Incorporating label correlations into deep neural networks to classify protein subcellular location patterns in immunohistochemistry images
- Author
-
Ying-Ying Xu, Yang Yang, Hong-Bin Shen, and Jin-Xian Hu
- Subjects
Artificial neural network ,business.industry ,Computer science ,Deep learning ,Proteins ,Pattern recognition ,Subcellular localization ,Proteomics ,Biochemistry ,Convolutional neural network ,Immunohistochemistry ,Protein Transport ,Recurrent neural network ,Deep Learning ,Structural Biology ,Protein subcellular location ,Image noise ,Humans ,Artificial intelligence ,Neural Networks, Computer ,business ,Molecular Biology - Abstract
Analysis of protein subcellular localization is a critical part of proteomics. In recent years, as both the number and quality of microscopic images are increasing rapidly, many automated methods, especially convolutional neural networks (CNN), have been developed to predict protein subcellular location(s) based on bioimages, but their performance always suffers from some inherent properties of the problem. First, many microscopic images have non-informative or noisy sections, like unstained stroma and unspecific background, which affect the extraction of protein expression information. Second, the patterns of protein subcellular localization are very complex, as a lot of proteins locate in more than one compartment. In this study, we propose a new label-correlation enhanced deep neural network, laceDNN, to classify the subcellular locations of multi-label proteins from immunohistochemistry images. The model uses small representative patches as input to alleviate the image noise issue, and its backbone is a hybrid architecture of CNN and recurrent neural network, where the former network extracts representative image features and the latter learns the organelle dependency relationships. Our experimental results indicate that the proposed model can improve the performance of multi-label protein subcellular classification.
- Published
- 2021
27. Localization of Organelle Proteins by Isotope Tagging: Current status and potential applications in drug discovery research
- Author
-
Lisa M. Breckels, Mohamed Elzek, Kathryn S. Lilley, Josie A. Christopher, Christopher, Josie [0000-0001-7077-4894], Lilley, Kathryn [0000-0003-0594-6543], and Apollo - University of Cambridge Repository
- Subjects
Organelles ,Proteomics ,0303 health sciences ,Protein function ,Proteome ,Drug discovery ,Computational biology ,Biology ,03 medical and health sciences ,0302 clinical medicine ,Isotopes ,Protein subcellular location ,Organelle ,Drug Discovery ,Molecular Medicine ,030217 neurology & neurosurgery ,030304 developmental biology - Abstract
Spatial proteomics has provided important insights into the relationship between protein function and subcellular location. Localization of Organelle Proteins by Isotope Tagging (LOPIT) and its variants are proteome-wide techniques, not matched in scale by microscopy-based or proximity tagging-based techniques, allowing holistic mapping of protein subcellular location and re-localization events downstream of cellular perturbations. LOPIT can be a powerful and versatile tool in drug discovery for unlocking important information on disease pathophysiology, drug mechanism of action, and off-target toxicity screenings. Here, we discuss technical concepts of LOPIT with its potential applications in drug discovery and development research.
- Published
- 2021
28. Automated analysis of immunohistochemistry images identifies candidate location biomarkers for cancers.
- Author
-
Kumar, Apama, Rao, Arvind, Bhavani, Santosh, Newberg, Justin Y., and Murphy, Robert F.
- Subjects
- *
BIOMARKERS , *IMMUNOCHEMISTRY , *IMMUNOHISTOCHEMISTRY , *HISTOCHEMISTRY , *BIOCHEMISTRY , *CANCER - Abstract
Molecular biomarkers are changes measured in biological samples that reflect disease states. Such markers can help clinicians identify types of cancer or stages of progression, and they can guide in tailoring specific therapies. Many efforts to identify biomarkers consider genes that mutate between normal and cancerous tissues or changes in protein or RNA expression levels. Here we define location biomarkers, proteins that undergo changes in subcellular location that are indicative of disease. To discover such biomarkers, we have developed an automated pipeline to compare the subcellular location of proteins between two sets of immunohistochemistry images. We used the pipeline to compare images of healthy and tumor tissue from the Human Protein Atlas, ranking hundreds of proteins in breast, liver, prostate, and bladder based on how much their location was estimated to have changed. The performance of the system was evaluated by determining whether proteins previously known to change location in tumors were ranked highly. We present a number of candidate location biomarkers for each tissue, and identify biochemical pathways that are enriched in proteins that change location. The analysis technology is anticipated to be useful not only for discovering new location biomarkers but also for enabling automated analysis of biomarker distributions as an aid to determining diagnosis. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
29. Protein subcellular localization based on deep image features and criterion learning strategy
- Author
-
Linlin He, Ran Su, Xiaofeng Liu, Leyi Wei, and Tianling Liu
- Subjects
0303 health sciences ,Proteome ,Computer science ,business.industry ,Pattern recognition ,02 engineering and technology ,Subcellular localization ,Convolutional neural network ,Task (project management) ,Set (abstract data type) ,03 medical and health sciences ,Protein subcellular location ,Image Processing, Computer-Assisted ,0202 electrical engineering, electronic engineering, information engineering ,Humans ,Deep neural networks ,020201 artificial intelligence & image processing ,Neural Networks, Computer ,Artificial intelligence ,business ,Molecular Biology ,030304 developmental biology ,Information Systems - Abstract
The spatial distribution of proteome at subcellular levels provides clues for protein functions, thus is important to human biology and medicine. Imaging-based methods are one of the most important approaches for predicting protein subcellular location. Although deep neural networks have shown impressive performance in a number of imaging tasks, its application to protein subcellular localization has not been sufficiently explored. In this study, we developed a deep imaging-based approach to localize the proteins at subcellular levels. Based on deep image features extracted from convolutional neural networks (CNNs), both single-label and multi-label locations can be accurately predicted. Particularly, the multi-label prediction is quite a challenging task. Here we developed a criterion learning strategy to exploit the label–attribute relevancy and label–label relevancy. A criterion that was used to determine the final label set was automatically obtained during the learning procedure. We concluded an optimal CNN architecture that could give the best results. Besides, experiments show that compared with the hand-crafted features, the deep features present more accurate prediction with less features. The implementation for the proposed method is available at https://github.com/RanSuLab/ProteinSubcellularLocation.
- Published
- 2020
- Full Text
- View/download PDF
30. Automated classification of protein subcellular localization in immunohistochemistry images to reveal biomarkers in colon cancer
- Author
-
Yanxia Wu, Qing-Zu Gao, Zhen-Zhen Xue, Liang Zhao, and Ying-Ying Xu
- Subjects
Feature engineering ,Colorectal cancer ,Bioinformatics ,Human Protein Atlas ,Protein subcellular location ,Computational biology ,Biology ,lcsh:Computer applications to medicine. Medical informatics ,Biochemistry ,Machine Learning ,03 medical and health sciences ,0302 clinical medicine ,Structural Biology ,Bioimage processing ,medicine ,Biomarkers, Tumor ,Humans ,Databases, Protein ,lcsh:QH301-705.5 ,Molecular Biology ,030304 developmental biology ,0303 health sciences ,Applied Mathematics ,Cancer ,Proteins ,medicine.disease ,Subcellular localization ,Immunohistochemistry ,Computer Science Applications ,lcsh:Biology (General) ,Colonic Neoplasms ,lcsh:R858-859.7 ,Biomarker (medicine) ,Cancer biomarkers ,DNA microarray ,030217 neurology & neurosurgery ,Research Article - Abstract
Background Protein biomarkers play important roles in cancer diagnosis. Many efforts have been made on measuring abnormal expression intensity in biological samples to identity cancer types and stages. However, the change of subcellular location of proteins, which is also critical for understanding and detecting diseases, has been rarely studied. Results In this work, we developed a machine learning model to classify protein subcellular locations based on immunohistochemistry images of human colon tissues, and validated the ability of the model to detect subcellular location changes of biomarker proteins related to colon cancer. The model uses representative image patches as inputs, and integrates feature engineering and deep learning methods. It achieves 92.69% accuracy in classification of new proteins. Two validation datasets of colon cancer biomarkers derived from published literatures and the human protein atlas database respectively are employed. It turns out that 81.82 and 65.66% of the biomarker proteins can be identified to change locations. Conclusions Our results demonstrate that using image patches and combining predefined and deep features can improve the performance of protein subcellular localization, and our model can effectively detect biomarkers based on protein subcellular translocations. This study is anticipated to be useful in annotating unknown subcellular localization for proteins and discovering new potential location biomarkers.
- Published
- 2020
31. Automated Protein Subcellular Localization Based on Local Invariant Features.
- Author
-
Li, Chao, Wang, Xue-hong, Zheng, Li, and Huang, Ji-feng
- Subjects
- *
SUBCELLULAR fractionation , *PROTEIN analysis , *FLUORESCENCE microscopy , *MOLECULAR rotation , *CELL analysis - Abstract
To understand the function of the encoded proteins, we need to be able to know the subcellular location of a protein. The most common method used for determining subcellular location is fluorescence microscopy which allows subcellular localizations to be imaged in high throughput. Image feature calculation has proven invaluable in the automated analysis of cellular images. This article proposes a novel method named LDPs for feature extraction based on invariant of translation and rotation from given images, the nature which is to count the local difference features of images, and the difference features are given by calculating the D-value between the gray value of the central pixel c and the gray values of eight pixels in the neighborhood. The novel method is tested on two image sets, the first set is which fluorescently tagged protein was endogenously expressed in 10 sebcellular locations, and the second set is which protein was transfected in 11 locations. A SVM was trained and tested for each image set and classification accuracies of 96.7 and 92.3 % were obtained on the endogenous and transfected sets respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
32. Application of PCA method to predicting protein subcellular location.
- Author
-
Ma Jun-wei, Shi Duo, Gu Hong, and Zhang Jie
- Subjects
PRINCIPAL components analysis ,PROTEINS ,ALGORITHMS ,AMINO acids ,BIOLOGY - Abstract
The location of a protein subcellular is closely correlated with its biological function. With the rapid expansion of protein databases, it is very important to design a powerful high-throughput algorithm for predicting protein subcellular location. Many prediction tools have been designed based on the pseudo-amino acid composition,. and a data analysis method, principal component analysis (PCA) method, is applied to determining in advance the optimal value of A which reflects sequence order effects. Firstly, the parameter A is set to the maximum to contain more sequence order information; then, PCA is employed to extract the essential features. Experimental results show that the proposed method solves the above problem, and its performance is better than those of other predictors. [ABSTRACT FROM AUTHOR]
- Published
- 2012
33. CE-PLoc: An ensemble classifier for predicting protein subcellular locations by fusing different modes of pseudo amino acid composition
- Author
-
Khan, Asifullah, Majid, Abdul, and Hayat, Maqsood
- Subjects
- *
PROTEINS , *AMINO acids , *PEPTIDES , *PROTEIN-protein interactions , *PREDICTION theory , *SUPPORT vector machines - Abstract
Abstract: Precise information about protein locations in a cell facilitates in the understanding of the function of a protein and its interaction in the cellular environment. This information further helps in the study of the specific metabolic pathways and other biological processes. We propose an ensemble approach called “CE-PLoc” for predicting subcellular locations based on fusion of individual classifiers. The proposed approach utilizes features obtained from both dipeptide composition (DC) and amphiphilic pseudo amino acid composition (PseAAC) based feature extraction strategies. Different feature spaces are obtained by varying the dimensionality using PseAAC for a selected base learner. The performance of the individual learning mechanisms such as support vector machine, nearest neighbor, probabilistic neural network, covariant discriminant, which are trained using PseAAC based features is first analyzed. Classifiers are developed using same learning mechanism but trained on PseAAC based feature spaces of varying dimensions. These classifiers are combined through voting strategy and an improvement in prediction performance is achieved. Prediction performance is further enhanced by developing CE-PLoc through the combination of different learning mechanisms trained on both DC based feature space and PseAAC based feature spaces of varying dimensions. The predictive performance of proposed CE-PLoc is evaluated for two benchmark datasets of protein subcellular locations using accuracy, MCC, and Q-statistics. Using the jackknife test, prediction accuracies of 81.47 and 83.99% are obtained for 12 and 14 subcellular locations datasets, respectively. In case of independent dataset test, prediction accuracies are 87.04 and 87.33% for 12 and 14 class datasets, respectively. [Copyright &y& Elsevier]
- Published
- 2011
- Full Text
- View/download PDF
34. LAB-Secretome: a genome-scale comparative analysis of the predicted extracellular and surface-associated proteins of Lactic Acid Bacteria.
- Author
-
Zhou, Miaomiao, Theunissen, Daniel, Wels, Michiel, and Siezen, Roland J
- Subjects
- *
LACTIC acid bacteria , *BACTERIAL genomes , *MOLECULAR evolution , *COMPARATIVE studies , *COMPARATIVE genomics , *PROTEINS , *CELL metabolism - Abstract
Background: In Lactic Acid Bacteria (LAB), the extracellular and surface-associated proteins can be involved in processes such as cell wall metabolism, degradation and uptake of nutrients, communication and binding to substrates or hosts. A genome-scale comparative study of these proteins (secretomes) can provide vast information towards the understanding of the molecular evolution, diversity, function and adaptation of LAB to their specific environmental niches. Results: We have performed an extensive prediction and comparison of the secretomes from 26 sequenced LAB genomes. A new approach to detect homolog clusters of secretome proteins (LaCOGs) was designed by integrating protein subcellular location prediction and homology clustering methods. The initial clusters were further adjusted semi-manually based on multiple sequence alignments, domain compositions, pseudogene analysis and biological function of the proteins. Ubiquitous protein families were identified, as well as species-specific, strain-specific, and niche-specific LaCOGs. Comparative analysis of protein subfamilies has shown that the distribution and functional specificity of LaCOGs could be used to explain many niche-specific phenotypes. A comprehensive and user-friendly database LAB-Secretome was constructed to store, visualize and update the extracellular proteins and LaCOGs http://www.cmbi.ru.nl/lab%5fsecretome/. This database will be updated regularly when new bacterial genomes become available. Conclusions: The LAB-Secretome database could be used to understand the evolution and adaptation of lactic acid bacteria to their environmental niches, to improve protein functional annotation and to serve as basis for targeted experimental studies. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
35. Predicting protein subcellular location: exploiting amino acid based sequence of feature spaces and fusion of diverse classifiers.
- Author
-
Khan, Asifullah, Majid, Abdul, and Tae-Sun Choi
- Subjects
- *
AMINO acids , *AMINO compounds , *PROTEINS , *JACKKNIFE (Statistics) , *ORGANIC acids - Abstract
A novel approach CE-Ploc is proposed for predicting protein subcellular locations by exploiting diversity both in feature and decision spaces. The diversity in a sequence of feature spaces is exploited using hydrophobicity and hydrophilicity of amphiphilic pseudo amino acid composition and a specific learning mechanism. Diversity in learning mechanisms is exploited by fusion of classifiers that are based on different learning mechanisms. Significant improvement in prediction performance is observed using jackknife and independent dataset tests. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
36. pLoc-mHum: predict subcellular localization of multi-location human proteins via general PseAAC to winnow out the crucial GO information
- Author
-
Kuo-Chen Chou, Xiang Cheng, and Xuan Xiao
- Subjects
0301 basic medicine ,Statistics and Probability ,Winnow ,Computer science ,Cell ,Machine learning ,computer.software_genre ,Biochemistry ,03 medical and health sciences ,Sequence Analysis, Protein ,Protein subcellular location ,medicine ,Humans ,Molecular Biology ,Pseudo amino acid composition ,Human proteins ,business.industry ,Computational Biology ,Subcellular localization ,Computer Science Applications ,Protein Transport ,Computational Mathematics ,Gene Ontology ,030104 developmental biology ,medicine.anatomical_structure ,Computational Theory and Mathematics ,Benchmark (computing) ,Artificial intelligence ,business ,computer ,Software - Abstract
Motivation For in-depth understanding the functions of proteins in a cell, the knowledge of their subcellular localization is indispensable. The current study is focused on human protein subcellular location prediction based on the sequence information alone. Although considerable efforts have been made in this regard, the problem is far from being solved yet. Most existing methods can be used to deal with single-location proteins only. Actually, proteins with multi-locations may have some special biological functions that are particularly important for both basic research and drug design. Results Using the multi-label theory, we present a new predictor called ‘pLoc-mHum’ by extracting the crucial GO (Gene Ontology) information into the general PseAAC (Pseudo Amino Acid Composition). Rigorous cross-validations on a same stringent benchmark dataset have indicated that the proposed pLoc-mHum predictor is remarkably superior to iLoc-Hum, the state-of-the-art method in predicting the human protein subcellular localization. Availability and implementation To maximize the convenience of most experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc-mHum/, by which users can easily get their desired results without the need to go through the complicated mathematics involved. Supplementary information Supplementary data are available at Bioinformatics online.
- Published
- 2017
- Full Text
- View/download PDF
37. Bioimage-based protein subcellular location prediction: a comprehensive review
- Author
-
Hong-Bin Shen, Li-Xiu Yao, and Ying-Ying Xu
- Subjects
0301 basic medicine ,General Computer Science ,Computer science ,Bioimage informatics ,Computational biology ,Protein distribution ,computer.software_genre ,Subcellular localization ,Field (computer science) ,Theoretical Computer Science ,03 medical and health sciences ,030104 developmental biology ,Protein subcellular location ,Classification methods ,Protein translocation ,Data mining ,computer - Abstract
Subcellular localization of proteins can provide key hints to infer their functions and structures in cells. With the breakthrough of recent molecule imaging techniques, the usage of 2D bioimages has become increasingly popular in automatically analyzing the protein subcellular location patterns. Compared with the widely used protein 1D amino acid sequence data, the images of protein distribution are more intuitive and interpretable, making the images a better choice at many applications for revealing the dynamic characteristics of proteins, such as detecting protein translocation and quantification of proteins. In this paper, we systematically reviewed the recent progresses in the field of automated image-based protein subcellular location prediction, and classified them into four categories including growing of bioimage databases, description of subcellular location distribution patterns, classification methods, and applications of the prediction systems. Besides, we also discussed some potential directions in this field.
- Published
- 2017
- Full Text
- View/download PDF
38. Prediction of protein subcellular location using a combined feature of sequence
- Author
-
Gao, Qing-Bin, Wang, Zheng-Zhi, Yan, Chun, and Du, Yao-Hua
- Subjects
- *
AMINO acids , *BIOMOLECULES , *EUKARYOTIC cells , *PROTEINS - Abstract
Abstract: To understand the structure and function of a protein, an important task is to know where it occurs in the cell. Thus, a computational method for properly predicting the subcellular location of proteins would be significant in interpreting the original data produced by the large-scale genome sequencing projects. The present work tries to explore an effective method for extracting features from protein primary sequence and find a novel measurement of similarity among proteins for classifying a protein to its proper subcellular location. We considered four locations in eukaryotic cells and three locations in prokaryotic cells, which have been investigated by several groups in the past. A combined feature of primary sequence defined as a 430D (dimensional) vector was utilized to represent a protein, including 20 amino acid compositions, 400 dipeptide compositions and 10 physicochemical properties. To evaluate the prediction performance of this encoding scheme, a jackknife test based on nearest neighbor algorithm was employed. The prediction accuracies for cytoplasmic, extracellular, mitochondrial, and nuclear proteins in the former dataset were 86.3%, 89.2%, 73.5% and 89.4%, respectively, and the total prediction accuracy reached 86.3%. As for the prediction accuracies of cytoplasmic, extracellular, and periplasmic proteins in the latter dataset, the prediction accuracies were 97.4%, 86.0%, and 79.7, respectively, and the total prediction accuracy of 92.5% was achieved. The results indicate that this method outperforms some existing approaches based on amino acid composition or amino acid composition and dipeptide composition. [Copyright &y& Elsevier]
- Published
- 2005
- Full Text
- View/download PDF
39. Prediction of protein subcellular locations by GO–FunD–PseAA predictor
- Author
-
Chou, Kuo-Chen and Cai, Yu-Dong
- Subjects
- *
PROTEINS , *ONTOLOGY , *AMINO acids , *BIOINFORMATICS - Abstract
The localization of a protein in a cell is closely correlated with its biological function. With the explosion of protein sequences entering into DataBanks, it is highly desired to develop an automated method that can fast identify their subcellular location. This will expedite the annotation process, providing timely useful information for both basic research and industrial application. In view of this, a powerful predictor has been developed by hybridizing the gene ontology approach [Nat. Genet. 25 (2000) 25], functional domain composition approach [J. Biol. Chem. 277 (2002) 45765], and the pseudo-amino acid composition approach [Proteins Struct. Funct. Genet. 43 (2001) 246; Erratum: ibid. 44 (2001) 60]. As a showcase, the recently constructed dataset [Bioinformatics 19 (2003) 1656] was used for demonstration. The dataset contains 7589 proteins classified into 12 subcellular locations: chloroplast, cytoplasmic, cytoskeleton, endoplasmic reticulum, extracellular, Golgi apparatus, lysosomal, mitochondrial, nuclear, peroxisomal, plasma membrane, and vacuolar. The overall success rate of prediction obtained by the jackknife cross-validation was 92%. This is so far the highest success rate performed on this dataset by following an objective and rigorous cross-validation procedure. [Copyright &y& Elsevier]
- Published
- 2004
- Full Text
- View/download PDF
40. A new hybrid approach to predict subcellular localization of proteins by incorporating gene ontology
- Author
-
Chou, Kuo-Chen and Cai, Yu-Dong
- Subjects
- *
GENES , *PROTEINS , *AMINO acids , *ALGORITHMS - Abstract
Based on the recent development in the gene ontology and functional domain databases, a new hybridization approach is developed for predicting protein subcellular location by combining the gene product, functional domain, and quasi-sequence-order effects. As a showcase, the same prokaryotic and eukaryotic datasets, which were studied by many previous investigators, are used for demonstration. The overall success rate by the jackknife test for the prokaryotic set is 94.7% and that for the eukaryotic set 92.9%. These are so far the highest success rates achieved for the two datasets by following a rigorous cross-validation test procedure, suggesting that such a hybrid approach may become a very useful high-throughput tool in the area of bioinformatics, proteomics, as well as molecular cell biology. The very high success rates also reflect the fact that the subcellular localization of a protein is closely correlated with: (1) the biological objective to which the gene or gene product contributes, (2) the biochemical activity of a gene product, and (3) the place in the cell where a gene product is active. [Copyright &y& Elsevier]
- Published
- 2003
- Full Text
- View/download PDF
41. Use of correspondence discriminant analysis to predict the subcellular location of bacterial proteins
- Author
-
Perrière, Guy and Thioulouse, Jean
- Subjects
- *
DISCRIMINANT analysis , *BACTERIA , *PROTEINS - Abstract
Correspondence discriminant analysis (CDA) is a multivariate statistical method derived from discriminant analysis which can be used on contingency tables. We have used CDA to separate Gram negative bacteria proteins according to their subcellular location. The high resolution of the discrimination obtained makes this method a good tool to predict subcellular location when this information is not known. The main advantage of this technique is its simplicity. Indeed, by computing two linear formulae on amino acid composition, it is possible to classify a protein into one of the three classes of subcellular location we have defined. The CDA itself can be computed with the ADE-4 software package that can be downloaded, as well as the data set used in this study, from the Poˆle Bio-Informatique Lyonnais (PBIL) server at http://pbil.univ-lyon1.fr. [Copyright &y& Elsevier]
- Published
- 2003
- Full Text
- View/download PDF
42. Predicting protein subcellular location using learned distributed representations from a protein-protein network
- Author
-
Min Liu, Lei Chen, Tao Huang, Xiaoyong Pan, and Yu-Dong Cai
- Subjects
Recurrent neural network ,business.industry ,Computer science ,Protein subcellular location ,Deep learning ,A protein ,Pattern recognition ,Artificial intelligence ,business ,Classifier (UML) ,Protein network - Abstract
Functions of proteins are in general related to their subcellular locations. To identify the functions of a protein, we first need know where this protein is located. Interacting proteins tend to locate in the same subcellular location. Thus, it is imperative to take the protein-protein interactions into account for computational identification of protein subcellular locations.In this study, we present a deep learning-based method, node2loc, to predict protein subcellular location. node2loc first learns distributed representations of proteins in a protein-protein network using node2vec, which acquires representations from unlabeled data for downstream tasks. Then the learned representations are further fed into a recurrent neural network (RNN) to predict subcellular locations. Considering the severe class imbalance of different subcellular locations, Synthetic Minority Over-sampling Technique (SMOTE) is applied to artificially boost subcellular locations with few proteins.We construct a benchmark dataset with 16 subcellular locations and evaluate node2loc on this dataset. node2loc yields a Matthews correlation coefficient (MCC) value of 0.812, which outperforms other baseline methods. The results demonstrate that the learned presentations from a protein-protein network have strong discriminate ability for classifying protein subcellular locations and the RNN is a more powerful classifier than traditional machine learning models. node2loc is freely available at https://github.com/xypan1232/node2loc.
- Published
- 2019
- Full Text
- View/download PDF
43. Impacts of Pseudo Amino Acid Components and 5-steps Rule to Proteomics and Proteome Analysis
- Author
-
Kuo-Chen Chou
- Subjects
Proteomics ,Proteome ,Computer science ,Computational biology ,03 medical and health sciences ,0302 clinical medicine ,Protein subcellular location ,Drug Discovery ,Feature (machine learning) ,Animals ,Humans ,Amino Acids ,030304 developmental biology ,chemistry.chemical_classification ,0303 health sciences ,Protein structural class ,Computational Biology ,General Medicine ,Amino acid ,chemistry ,030220 oncology & carcinogenesis ,Posttranslational modification ,Protein Processing, Post-Translational ,Software - Abstract
Stimulated by the 5-steps rule during the last decade or so, computational proteomics has achieved remarkable progresses in the following three areas: (1) protein structural class prediction; (2) protein subcellular location prediction; (3) post-translational modification (PTM) site prediction. The results obtained by these predictions are very useful not only for an in-depth study of the functions of proteins and their biological processes in a cell, but also for developing novel drugs against major diseases such as cancers, Alzheimer’s, and Parkinson’s. Moreover, since the targets to be predicted may have the multi-label feature, two sets of metrics are introduced: one is for inspecting the global prediction quality, while the other for the local prediction quality. All the predictors covered in this review have a userfriendly web-server, through which the majority of experimental scientists can easily obtain their desired data without the need to go through the complicated mathematics.
- Published
- 2019
44. MIC_Locator: a novel image-based protein subcellular location multi-label prediction model based on multi-scale monogenic signal representation and intensity encoding strategy
- Author
-
Zhijian Yin, Fan Yang, Yanbin Wang, Yang Liu, and Zhen Yang
- Subjects
Protein subcellular localization ,Computer science ,Bioimage informatics ,Intracellular Space ,lcsh:Computer applications to medicine. Medical informatics ,Proteomics ,Biochemistry ,Frequency domain feature ,Protein Annotation ,Structural Biology ,Protein subcellular location ,Encoding (memory) ,Molecule ,Humans ,Databases, Protein ,lcsh:QH301-705.5 ,Molecular Biology ,Protein function ,business.industry ,Applied Mathematics ,Proteins ,Pattern recognition ,Genome project ,Filter (signal processing) ,Subcellular localization ,Protein subcellular localization prediction ,Multi-label classifier chain ,Computer Science Applications ,Monogenic signal ,Protein Transport ,Transformation (function) ,ComputingMethodologies_PATTERNRECOGNITION ,lcsh:Biology (General) ,Feature (computer vision) ,lcsh:R858-859.7 ,Artificial intelligence ,DNA microarray ,business ,Algorithms ,Research Article ,Image intensity encoding strategy - Abstract
BackgroundProtein subcellular localization plays a crucial role in understanding cell function. Proteins need to be in the right place at the right time, and combine with the corresponding molecules to fulfill their functions. Furthermore, prediction of protein subcellular location not only should be a guiding role in drug design and development due to potential molecular targets but also be an essential role in genome annotation. Taking the current status of image-based protein subcellular localization as an example, there are three common drawbacks, i.e., obsolete datasets without updating label information, stereotypical feature descriptor on spatial domain or grey level, and single-function prediction algorithm’s limited capacity of handling single-label database.ResultsIn this paper, a novel human protein subcellular localization prediction model MIC_Locator is proposed. Firstly, the latest datasets are collected and collated as our benchmark dataset instead of obsolete data while training prediction model. Secondly, Fourier transformation, Riesz transformation, Log-Gabor filter and intensity coding strategy are employed to obtain frequency feature based on three components of monogenic signal with different frequency scales. Thirdly, a chained prediction model is proposed to handle multi-label instead of single-label datasets. The experiment results showed that the MIC_Locator can achieve 60.56% subset accuracy and outperform the existing majority of prediction models, and the frequency feature and intensity coding strategy can be conducive to improving the classification accuracy.ConclusionsOur results demonstrate that the frequency feature is more beneficial for improving the performance of model compared to features extracted from spatial domain, and the MIC_Locator proposed in this paper can speed up validation of protein annotation, knowledge of protein function and proteomics research.
- Published
- 2019
45. Image-Based Human Protein Subcellular Location Prediction Using Local Tetra Patterns Descriptor
- Author
-
Fan Yang, Yang Liu, and Han Wei
- Subjects
Protein function ,Computer science ,business.industry ,Problem transformation ,Local feature descriptor ,Pattern recognition ,Multi label learning ,Support vector machine ,ComputingMethodologies_PATTERNRECOGNITION ,Protein subcellular location ,Artificial intelligence ,business ,Classifier (UML) ,Image based - Abstract
Protein subcellular location has a huge positive influence on understanding protein function. In the past decades, many image-based automated approaches have been published for predicting protein subcellular location. However, in the reported literatures, there is a common deficiency for diverse prediction models in capturing local information of interest region of image. It motivates us to propose a novel approach by employing local feature descriptor named the Local Tetra Patterns (LTrP). In this paper, local features together with global features were fed to support vector machine to train chain classifiers, which can deal with multi-label datasets by using problem transformation pattern. To verify the validity of our approach, three different experiments were conducted based on the same benchmark dataset. The results show that the performance of the classification with LTrP descriptor not only captured more local information in interest region of images but also contributed to the improvement of prediction precision since the local descriptor is encoded along horizontal and vertical directions by LTrP. By applying the new approach, a more accurate classifier of protein subcellular location can be modeled, which is crucial to screen cancer biomarkers and research pathology mechanisms.
- Published
- 2019
- Full Text
- View/download PDF
46. Based on Gene Ontology Semantic Similarity Protein Subcellular Location Prediction
- Author
-
Xiangliang Zhang and Min Jin
- Subjects
Computational Mathematics ,Semantic similarity ,Protein subcellular location ,Computer science ,Gene ontology ,General Materials Science ,General Chemistry ,Computational biology ,Electrical and Electronic Engineering ,Condensed Matter Physics - Published
- 2015
- Full Text
- View/download PDF
47. Protein Subcellular Location Prediction Based on Pseudo Amino Acid Composition and PSI-Blast Profile
- Author
-
Yuhua Yao, Ping-An He, Qi Dai, Huimin Xu, Bo Liao, and Shoujiang Yan
- Subjects
Computational Mathematics ,Biochemistry ,Chemistry ,Protein subcellular location ,General Materials Science ,General Chemistry ,Electrical and Electronic Engineering ,Condensed Matter Physics ,Pseudo amino acid composition - Published
- 2015
- Full Text
- View/download PDF
48. Predicting Protein Subcellular Location Based on a Novel Sequence Numerical Model
- Author
-
Haowen Chen, Zhi Cao, Xia Chen, and Qingming Hu
- Subjects
Computational Mathematics ,Protein subcellular location ,General Materials Science ,General Chemistry ,Computational biology ,Electrical and Electronic Engineering ,Biology ,Condensed Matter Physics ,Bioinformatics ,Sequence (medicine) - Published
- 2015
- Full Text
- View/download PDF
49. Image-based classification of protein subcellular location patterns in human reproductive tissue by ensemble learning global and local features
- Author
-
Ying-Ying Xu, Shitong Wang, Fan Yang, and Hong-Bin Shen
- Subjects
Computer science ,Local binary patterns ,business.industry ,Cognitive Neuroscience ,Feature extraction ,Human Protein Atlas ,Pattern recognition ,Feature selection ,Subcellular localization ,Ensemble learning ,Computer Science Applications ,Support vector machine ,Artificial Intelligence ,Protein subcellular location ,Artificial intelligence ,business ,Classifier (UML) - Abstract
The reproductive system is a specific system of organs working together for the purpose of reproduction. As one of the most significant characteristics of human cell, subcellular localization plays a critical role for understanding specific functions of mammalian proteins. In this study, we developed a novel computational protocol for predicting protein subcellular locations from microscope images of cells in human reproductive tissues. Three major steps are contained in this protocol, i.e., protein object identification, image feature extraction, and classification. We first separated protein and DNA staining in the images with both linear and non-negative matrix factorization separation methods; then we extracted protein multi-view global and local texture features including wavelet Haralick, local binary patterns, local ternary patterns, and the local quinary patterns; finally based on the selected important feature subset, we constructed an ensemble classifier with support vector machines for classifications. Experiments are performed on a benchmark dataset consisting of seven major subcellular classes in human reproductive tissues collected from human protein atlas. Our results show that the local texture pattern features play an important complementary role to global features for enhancing the prediction performance. An overall accuracy of 85% is obtained through current system, and when only confident classifications are considered, the accuracy can reach 99%. It is the first developed image based protein subcellular location predictor specifically for human reproductive tissue. The promising results indicate that the developed protocol can be applied for accurate large-scale protein subcellular localization annotations in human reproductive system.
- Published
- 2014
- Full Text
- View/download PDF
50. Predicting protein subcellular location with network embedding and enrichment features.
- Author
-
Pan, Xiaoyong, Lu, Lin, and Cai, Yu-Dong
- Subjects
- *
RECURRENT neural networks , *ARTIFICIAL neural networks , *DECISION trees , *AMINO acid sequence , *FEATURE selection , *PROTEINS - Abstract
The subcellular location of a protein is highly related to its function. Identifying the location of a given protein is an essential step for investigating its related problems. Traditional experimental methods can produce solid determination. However, their limitations, such as high cost and low efficiency, are evident. Computational methods provide an alternative means to address these problems. Most previous methods constantly extract features from protein sequences or structures for building prediction models. In this study, we use two types of features and combine them to construct the model. The first feature type is extracted from a protein–protein interaction network to abstract the relationship between the encoded protein and other proteins. The second type is obtained from gene ontology and biological pathways to indicate the existing functions of the encoded protein. These features are analyzed using some feature selection methods. The final optimum features are adopted to build the model with recurrent neural network as the classification algorithm. Such model yields good performance with Matthews correlation coefficient of 0.844. A decision tree is used as a rule learning classifier to extract decision rules. Although the performance of decision rules is poor, they are valuable in revealing the molecular mechanism of proteins with different subcellular locations. The final analysis confirms the reliability of the extracted rules. The source code of the propose method is freely available at https://github.com/xypan1232/rnnloc Unlabelled Image • Extract protein embeddings from a protein-protein network using Node2Vec. • Combine the protein embedding with enrichment features derived from functional data. • Train recurrent neural network to classify 16 localizations with multiple steps of feature selection. • Learn classification rules of protein localizations using decision trees. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.