37 results on '"protein subcellular location"'
Search Results
2. Learning protein subcellular localization multi-view patterns from heterogeneous data of imaging, sequence and networks.
- Author
-
Wang, Ge, Xue, Min-Qi, Shen, Hong-Bin, and Xu, Ying-Ying
- Subjects
- *
AMINO acid sequence , *PROTEINS , *DEEP learning , *PROTEIN expression , *PROTEIN-protein interactions - Abstract
Location proteomics seeks to provide automated high-resolution descriptions of protein location patterns within cells. Many efforts have been undertaken in location proteomics over the past decades, thereby producing plenty of automated predictors for protein subcellular localization. However, most of these predictors are trained solely from high-throughput microscopic images or protein amino acid sequences alone. Unifying heterogeneous protein data sources has yet to be exploited. In this paper, we present a pipeline called sequence, image, network-based protein subcellular locator (SIN-Locator) that constructs a multi-view description of proteins by integrating multiple data types including images of protein expression in cells or tissues, amino acid sequences and protein–protein interaction networks, to classify the patterns of protein subcellular locations. Proteins were encoded by both handcrafted features and deep learning features, and multiple combining methods were implemented. Our experimental results indicated that optimal integrations can considerately enhance the classification accuracy, and the utility of SIN-Locator has been demonstrated through applying to new released proteins in the human protein atlas. Furthermore, we also investigate the contribution of different data sources and influence of partial absence of data. This work is anticipated to provide clues for reconciliation and combination of multi-source data for protein location analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
3. Automated classification of protein subcellular localization in immunohistochemistry images to reveal biomarkers in colon cancer
- Author
-
Zhen-Zhen Xue, Yanxia Wu, Qing-Zu Gao, Liang Zhao, and Ying-Ying Xu
- Subjects
Bioimage processing ,Bioinformatics ,Machine learning ,Protein subcellular location ,Cancer biomarkers ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Protein biomarkers play important roles in cancer diagnosis. Many efforts have been made on measuring abnormal expression intensity in biological samples to identity cancer types and stages. However, the change of subcellular location of proteins, which is also critical for understanding and detecting diseases, has been rarely studied. Results In this work, we developed a machine learning model to classify protein subcellular locations based on immunohistochemistry images of human colon tissues, and validated the ability of the model to detect subcellular location changes of biomarker proteins related to colon cancer. The model uses representative image patches as inputs, and integrates feature engineering and deep learning methods. It achieves 92.69% accuracy in classification of new proteins. Two validation datasets of colon cancer biomarkers derived from published literatures and the human protein atlas database respectively are employed. It turns out that 81.82 and 65.66% of the biomarker proteins can be identified to change locations. Conclusions Our results demonstrate that using image patches and combining predefined and deep features can improve the performance of protein subcellular localization, and our model can effectively detect biomarkers based on protein subcellular translocations. This study is anticipated to be useful in annotating unknown subcellular localization for proteins and discovering new potential location biomarkers.
- Published
- 2020
- Full Text
- View/download PDF
4. Predicting Human Protein Subcellular Locations by Using a Combination of Network and Function Features.
- Author
-
Chen, Lei, Li, ZhanDong, Zeng, Tao, Zhang, Yu-Hang, Zhang, ShiQi, Huang, Tao, and Cai, Yu-Dong
- Subjects
FEATURE selection ,PROTEIN-protein interactions ,MACHINE learning ,PROTEINS ,CLASSIFICATION algorithms ,POLYMER networks - Abstract
Given the limitation of technologies, the subcellular localizations of proteins are difficult to identify. Predicting the subcellular localization and the intercellular distribution patterns of proteins in accordance with their specific biological roles, including validated functions, relationships with other proteins, and even their specific sequence characteristics, is necessary. The computational prediction of protein subcellular localizations can be performed on the basis of the sequence and the functional characteristics. In this study, the protein–protein interaction network, functional annotation of proteins and a group of direct proteins with known subcellular localization were used to construct models. To build efficient models, several powerful machine learning algorithms, including two feature selection methods, four classification algorithms, were employed. Some key proteins and functional terms were discovered, which may provide important contributions for determining protein subcellular locations. Furthermore, some quantitative rules were established to identify the potential subcellular localizations of proteins. As the first prediction model that uses direct protein annotation information (i.e., functional features) and STRING-based protein–protein interaction network (i.e., network features), our computational model can help promote the development of predictive technologies on subcellular localizations and provide a new approach for exploring the protein subcellular localization patterns and their potential biological importance. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
5. PScL-HDeep: image-based prediction of protein subcellular location in human tissue using ensemble learning of handcrafted and deep learned features with two-layer feature selection.
- Author
-
Ullah, Matee, Han, Ke, Hadi, Fazal, Xu, Jian, Song, Jiangning, and Yu, Dong-Jun
- Subjects
- *
FEATURE selection , *DEEP learning , *RADIAL basis functions , *PROTEOMICS , *SUPPORT vector machines , *DISCRIMINANT analysis - Abstract
Protein subcellular localization plays a crucial role in characterizing the function of proteins and understanding various cellular processes. Therefore, accurate identification of protein subcellular location is an important yet challenging task. Numerous computational methods have been proposed to predict the subcellular location of proteins. However, most existing methods have limited capability in terms of the overall accuracy, time consumption and generalization power. To address these problems, in this study, we developed a novel computational approach based on human protein atlas (HPA) data, referred to as PScL-HDeep, for accurate and efficient image-based prediction of protein subcellular location in human tissues. We extracted different handcrafted and deep learned (by employing pretrained deep learning model) features from different viewpoints of the image. The step-wise discriminant analysis (SDA) algorithm was applied to generate the optimal feature set from each original raw feature set. To further obtain a more informative feature subset, support vector machine–based recursive feature elimination with correlation bias reduction (SVM-RFE + CBR) feature selection algorithm was applied to the integrated feature set. Finally, the classification models, namely support vector machine with radial basis function (SVM-RBF) and support vector machine with linear kernel (SVM-LNR), were learned on the final selected feature set. To evaluate the performance of the proposed method, a new gold standard benchmark training dataset was constructed from the HPA databank. PScL-HDeep achieved the maximum performance on 10-fold cross validation test on this dataset and showed a better efficacy over existing predictors. Furthermore, we also illustrated the generalization ability of the proposed method by conducting a stringent independent validation test. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
6. Predicting Human Protein Subcellular Locations by Using a Combination of Network and Function Features
- Author
-
Lei Chen, ZhanDong Li, Tao Zeng, Yu-Hang Zhang, ShiQi Zhang, Tao Huang, and Yu-Dong Cai
- Subjects
protein subcellular location ,protein-protein interaction network ,GO enrichment ,KEGG enrichment ,feature selection ,classification algorithm ,Genetics ,QH426-470 - Abstract
Given the limitation of technologies, the subcellular localizations of proteins are difficult to identify. Predicting the subcellular localization and the intercellular distribution patterns of proteins in accordance with their specific biological roles, including validated functions, relationships with other proteins, and even their specific sequence characteristics, is necessary. The computational prediction of protein subcellular localizations can be performed on the basis of the sequence and the functional characteristics. In this study, the protein–protein interaction network, functional annotation of proteins and a group of direct proteins with known subcellular localization were used to construct models. To build efficient models, several powerful machine learning algorithms, including two feature selection methods, four classification algorithms, were employed. Some key proteins and functional terms were discovered, which may provide important contributions for determining protein subcellular locations. Furthermore, some quantitative rules were established to identify the potential subcellular localizations of proteins. As the first prediction model that uses direct protein annotation information (i.e., functional features) and STRING-based protein–protein interaction network (i.e., network features), our computational model can help promote the development of predictive technologies on subcellular localizations and provide a new approach for exploring the protein subcellular localization patterns and their potential biological importance.
- Published
- 2021
- Full Text
- View/download PDF
7. Protein subcellular localization based on deep image features and criterion learning strategy.
- Author
-
Su, Ran, He, Linlin, Liu, Tianling, Liu, Xiaofeng, and Wei, Leyi
- Subjects
- *
LEARNING strategies , *CONVOLUTIONAL neural networks , *IMMOBILIZED proteins , *HUMAN biology , *PROTEINS - Abstract
The spatial distribution of proteome at subcellular levels provides clues for protein functions, thus is important to human biology and medicine. Imaging-based methods are one of the most important approaches for predicting protein subcellular location. Although deep neural networks have shown impressive performance in a number of imaging tasks, its application to protein subcellular localization has not been sufficiently explored. In this study, we developed a deep imaging-based approach to localize the proteins at subcellular levels. Based on deep image features extracted from convolutional neural networks (CNNs), both single-label and multi-label locations can be accurately predicted. Particularly, the multi-label prediction is quite a challenging task. Here we developed a criterion learning strategy to exploit the label–attribute relevancy and label–label relevancy. A criterion that was used to determine the final label set was automatically obtained during the learning procedure. We concluded an optimal CNN architecture that could give the best results. Besides, experiments show that compared with the hand-crafted features, the deep features present more accurate prediction with less features. The implementation for the proposed method is available at https://github.com/RanSuLab/ProteinSubcellularLocation. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
8. Consistency and variation of protein subcellular location annotations.
- Author
-
Xu, Ying‐Ying, Zhou, Hang, Murphy, Robert F., and Shen, Hong‐Bin
- Abstract
A major challenge for protein databases is reconciling information from diverse sources. This is especially difficult when some information consists of secondary, human‐interpreted rather than primary data. For example, the Swiss‐Prot database contains curated annotations of subcellular location that are based on predictions from protein sequence, statements in scientific articles, and published experimental evidence. The Human Protein Atlas (HPA) consists of millions of high‐resolution microscopic images that show protein spatial distribution on a cellular and subcellular level. These images are manually annotated with protein subcellular locations by trained experts. The image annotations in HPA can capture the variation of subcellular location across different cell lines, tissues, or tissue states. Systematic investigation of the consistency between HPA and Swiss‐Prot assignments of subcellular location, which is important for understanding and utilizing protein location data from the two databases, has not been described previously. In this paper, we quantitatively evaluate the consistency of subcellular location annotations between HPA and Swiss‐Prot at multiple levels, as well as variation of protein locations across cell lines and tissues. Our results show that annotations of these two databases differ significantly in many cases, leading to proposed procedures for deriving and integrating the protein subcellular location data. We also find that proteins having highly variable locations are more likely to be biomarkers of diseases, providing support for incorporating analysis of subcellular location in protein biomarker identification and screening. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
9. Bioimage-Based Prediction of Protein Subcellular Location in Human Tissue with Ensemble Features and Deep Networks.
- Author
-
Liu, Guang-Hui, Zhang, Bei-Wei, Qian, Gang, Wang, Bin, Mao, Bo, and Bichindaritz, Isabelle
- Abstract
Prediction of protein subcellular location has currently become a hot topic because it has been proven to be useful for understanding both the disease mechanisms and novel drug design. With the rapid development of automated microscopic imaging technology in recent years, classification methods of bioimage-based protein subcellular location have attracted considerable attention for images can describe the protein distribution intuitively and in detail. In the current study, a prediction method of protein subcellular location was proposed based on multi-view image features that are extracted from three different views, including the four texture features of the original image, the global and local features of the protein extracted from the protein channel images after color segmentation, and the global features of DNA extracted from the DNA channel image. Finally, the extracted features were combined together to improve the performance of subcellular localization prediction. From the performance comparison of different combination features under the same classifier, the best ensemble features could be obtained. In this work, a classifier based on Stacked Auto-encoders and the random forest was also put forward. To improve the prediction results, the deep network was combined with the traditional statistical classification methods. Stringent cross-validation and independent validation tests on the benchmark dataset demonstrated the efficacy of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
10. Automated classification of protein subcellular localization in immunohistochemistry images to reveal biomarkers in colon cancer.
- Author
-
Xue, Zhen-Zhen, Wu, Yanxia, Gao, Qing-Zu, Zhao, Liang, and Xu, Ying-Ying
- Subjects
COLON cancer ,HISTOCHEMISTRY ,BIOLOGICAL tags ,IDENTITY (Psychology) ,TUMOR classification ,PROTEINS ,COLON (Anatomy) - Abstract
Background: Protein biomarkers play important roles in cancer diagnosis. Many efforts have been made on measuring abnormal expression intensity in biological samples to identity cancer types and stages. However, the change of subcellular location of proteins, which is also critical for understanding and detecting diseases, has been rarely studied. Results: In this work, we developed a machine learning model to classify protein subcellular locations based on immunohistochemistry images of human colon tissues, and validated the ability of the model to detect subcellular location changes of biomarker proteins related to colon cancer. The model uses representative image patches as inputs, and integrates feature engineering and deep learning methods. It achieves 92.69% accuracy in classification of new proteins. Two validation datasets of colon cancer biomarkers derived from published literatures and the human protein atlas database respectively are employed. It turns out that 81.82 and 65.66% of the biomarker proteins can be identified to change locations. Conclusions: Our results demonstrate that using image patches and combining predefined and deep features can improve the performance of protein subcellular localization, and our model can effectively detect biomarkers based on protein subcellular translocations. This study is anticipated to be useful in annotating unknown subcellular localization for proteins and discovering new potential location biomarkers. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
11. Active machine learning-driven experimentation to determine compound effects on protein patterns
- Author
-
Armaghan W Naik, Joshua D Kangas, Devin P Sullivan, and Robert F Murphy
- Subjects
active learning ,protein subcellular location ,laboratory automation ,high content screening ,automation of research ,machine learning ,Medicine ,Science ,Biology (General) ,QH301-705.5 - Abstract
High throughput screening determines the effects of many conditions on a given biological target. Currently, to estimate the effects of those conditions on other targets requires either strong modeling assumptions (e.g. similarities among targets) or separate screens. Ideally, data-driven experimentation could be used to learn accurate models for many conditions and targets without doing all possible experiments. We have previously described an active machine learning algorithm that can iteratively choose small sets of experiments to learn models of multiple effects. We now show that, with no prior knowledge and with liquid handling robotics and automated microscopy under its control, this learner accurately learned the effects of 48 chemical compounds on the subcellular localization of 48 proteins while performing only 29% of all possible experiments. The results represent the first practical demonstration of the utility of active learning-driven biological experimentation in which the set of possible phenotypes is unknown in advance.
- Published
- 2016
- Full Text
- View/download PDF
12. Consistency and variation of protein subcellular location annotations
- Author
-
Robert F. Murphy, Hong-Bin Shen, Hang Zhou, and Ying-Ying Xu
- Subjects
Biomarker identification ,Human Protein Atlas ,Computational biology ,Variation (game tree) ,Biology ,Biochemistry ,Article ,Cell Line ,Protein location ,Variable locations ,03 medical and health sciences ,Consistency (database systems) ,Atlases as Topic ,Protein sequencing ,Structural Biology ,Protein subcellular location ,Humans ,Databases, Protein ,Molecular Biology ,030304 developmental biology ,Observer Variation ,0303 health sciences ,030302 biochemistry & molecular biology ,Uncertainty ,Proteins ,Reproducibility of Results ,Molecular Sequence Annotation ,Cell Compartmentation ,Eukaryotic Cells - Abstract
A major challenge for protein databases is reconciling information from diverse sources. This is especially difficult when some information consists of secondary, human-interpreted rather than primary data. For example, the Swiss-Prot database contains curated annotations of subcellular location that are based on predictions from protein sequence, statements in scientific articles, and published experimental evidence. The Human Protein Atlas (HPA) consists of millions of high-resolution microscopic images that show protein spatial distribution on a cellular and subcellular level. These images are manually annotated with protein subcellular locations by trained experts. The image annotations in HPA can capture the variation of subcellular location across different cell lines, tissues, or tissue states. Systematic investigation of the consistency between HPA and Swiss-Prot assignments of subcellular location, which is important for understanding and utilizing protein location data from the two databases, has not been described previously. In this paper, we quantitatively evaluate the consistency of subcellular location annotations between HPA and Swiss-Prot at multiple levels, as well as variation of protein locations across cell lines and tissues. Our results show that annotations of these two databases differ significantly in many cases, leading to proposed procedures for deriving and integrating the protein subcellular location data. We also find that proteins having highly variable locations are more likely to be biomarkers of diseases, providing support for incorporating analysis of subcellular location in protein biomarker identification and screening.
- Published
- 2020
- Full Text
- View/download PDF
13. SCLpred-EMS: subcellular localization prediction of endomembrane system and secretory pathway proteins by Deep N-to-1 Convolutional Neural Networks
- Author
-
Jeremy C. Simpson, Gianluca Pollastri, Manaz Kaleel, Xuanming Feng, Yandan Zheng, Catherine Mooney, and Jialiang Chen
- Subjects
Statistics and Probability ,Computer science ,Computational biology ,Secretory pathway ,Biochemistry ,Convolutional neural network ,Machine Learning ,03 medical and health sciences ,Protein subcellular location ,Machine learning ,Protein function prediction ,Endomembrane system ,Molecular Biology ,030304 developmental biology ,0303 health sciences ,Secretory Pathway ,Artificial neural network ,030302 biochemistry & molecular biology ,Computational Biology ,Proteins ,A protein ,Subcellular localization ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,Neural Networks, Computer ,Algorithms ,Neural networks - Abstract
Motivation The subcellular location of a protein can provide useful information for protein function prediction and drug design. Experimentally determining the subcellular location of a protein is an expensive and time-consuming task. Therefore, various computer-based tools have been developed, mostly using machine learning algorithms, to predict the subcellular location of proteins. Results Here, we present a neural network-based algorithm for protein subcellular location prediction. We introduce SCLpred-EMS a subcellular localization predictor powered by an ensemble of Deep N-to-1 Convolutional Neural Networks. SCLpred-EMS predicts the subcellular location of a protein into two classes, the endomembrane system and secretory pathway versus all others, with a Matthews correlation coefficient of 0.75–0.86 outperforming the other state-of-the-art web servers we tested. Availability and implementation SCLpred-EMS is freely available for academic users at http://distilldeep.ucd.ie/SCLpred2/. Contact catherine.mooney@ucd.ie
- Published
- 2020
- Full Text
- View/download PDF
14. Text as data: Using text-based features for proteins representation and for computational prediction of their characteristics.
- Author
-
Shatkay, Hagit, Brady, Scott, and Wong, Andrew
- Subjects
- *
NUCLEOTIDE sequencing , *PREDICTION models , *TEXT mining , *BIOINFORMATICS , *MACHINE learning - Abstract
The current era of large-scale biology is characterized by a fast-paced growth in the number of sequenced genomes and, consequently, by a multitude of identified proteins whose function has yet to be determined. Simultaneously, any known or postulated information concerning genes and proteins is part of the ever-growing published scientific literature, which is expanding at a rate of over a million new publications per year. Computational tools that attempt to automatically predict and annotate protein characteristics, such as function and localization patterns, are being developed along with systems that aim to support the process via text mining. Most work on protein characterization focuses on features derived directly from protein sequence data. Protein-related work that does aim to utilize the literature typically concentrates on extracting specific facts (e.g., protein interactions) from text. In the past few years we have taken a different route, treating the literature as a source of text-based features, which can be employed just as sequence-based protein-features were used in earlier work, for predicting protein subcellular location and possibly also function. We discuss here in detail the overall approach, along with results from work we have done in this area demonstrating the value of this method and its potential use. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
15. Automated analysis of immunohistochemistry images identifies candidate location biomarkers for cancers.
- Author
-
Kumar, Apama, Rao, Arvind, Bhavani, Santosh, Newberg, Justin Y., and Murphy, Robert F.
- Subjects
- *
BIOMARKERS , *IMMUNOCHEMISTRY , *IMMUNOHISTOCHEMISTRY , *HISTOCHEMISTRY , *BIOCHEMISTRY , *CANCER - Abstract
Molecular biomarkers are changes measured in biological samples that reflect disease states. Such markers can help clinicians identify types of cancer or stages of progression, and they can guide in tailoring specific therapies. Many efforts to identify biomarkers consider genes that mutate between normal and cancerous tissues or changes in protein or RNA expression levels. Here we define location biomarkers, proteins that undergo changes in subcellular location that are indicative of disease. To discover such biomarkers, we have developed an automated pipeline to compare the subcellular location of proteins between two sets of immunohistochemistry images. We used the pipeline to compare images of healthy and tumor tissue from the Human Protein Atlas, ranking hundreds of proteins in breast, liver, prostate, and bladder based on how much their location was estimated to have changed. The performance of the system was evaluated by determining whether proteins previously known to change location in tumors were ranked highly. We present a number of candidate location biomarkers for each tissue, and identify biochemical pathways that are enriched in proteins that change location. The analysis technology is anticipated to be useful not only for discovering new location biomarkers but also for enabling automated analysis of biomarker distributions as an aid to determining diagnosis. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
16. Automated Protein Subcellular Localization Based on Local Invariant Features.
- Author
-
Li, Chao, Wang, Xue-hong, Zheng, Li, and Huang, Ji-feng
- Subjects
- *
SUBCELLULAR fractionation , *PROTEIN analysis , *FLUORESCENCE microscopy , *MOLECULAR rotation , *CELL analysis - Abstract
To understand the function of the encoded proteins, we need to be able to know the subcellular location of a protein. The most common method used for determining subcellular location is fluorescence microscopy which allows subcellular localizations to be imaged in high throughput. Image feature calculation has proven invaluable in the automated analysis of cellular images. This article proposes a novel method named LDPs for feature extraction based on invariant of translation and rotation from given images, the nature which is to count the local difference features of images, and the difference features are given by calculating the D-value between the gray value of the central pixel c and the gray values of eight pixels in the neighborhood. The novel method is tested on two image sets, the first set is which fluorescently tagged protein was endogenously expressed in 10 sebcellular locations, and the second set is which protein was transfected in 11 locations. A SVM was trained and tested for each image set and classification accuracies of 96.7 and 92.3 % were obtained on the endogenous and transfected sets respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
17. Application of PCA method to predicting protein subcellular location.
- Author
-
Ma Jun-wei, Shi Duo, Gu Hong, and Zhang Jie
- Subjects
PRINCIPAL components analysis ,PROTEINS ,ALGORITHMS ,AMINO acids ,BIOLOGY - Abstract
The location of a protein subcellular is closely correlated with its biological function. With the rapid expansion of protein databases, it is very important to design a powerful high-throughput algorithm for predicting protein subcellular location. Many prediction tools have been designed based on the pseudo-amino acid composition,. and a data analysis method, principal component analysis (PCA) method, is applied to determining in advance the optimal value of A which reflects sequence order effects. Firstly, the parameter A is set to the maximum to contain more sequence order information; then, PCA is employed to extract the essential features. Experimental results show that the proposed method solves the above problem, and its performance is better than those of other predictors. [ABSTRACT FROM AUTHOR]
- Published
- 2012
18. CE-PLoc: An ensemble classifier for predicting protein subcellular locations by fusing different modes of pseudo amino acid composition
- Author
-
Khan, Asifullah, Majid, Abdul, and Hayat, Maqsood
- Subjects
- *
PROTEINS , *AMINO acids , *PEPTIDES , *PROTEIN-protein interactions , *PREDICTION theory , *SUPPORT vector machines - Abstract
Abstract: Precise information about protein locations in a cell facilitates in the understanding of the function of a protein and its interaction in the cellular environment. This information further helps in the study of the specific metabolic pathways and other biological processes. We propose an ensemble approach called “CE-PLoc” for predicting subcellular locations based on fusion of individual classifiers. The proposed approach utilizes features obtained from both dipeptide composition (DC) and amphiphilic pseudo amino acid composition (PseAAC) based feature extraction strategies. Different feature spaces are obtained by varying the dimensionality using PseAAC for a selected base learner. The performance of the individual learning mechanisms such as support vector machine, nearest neighbor, probabilistic neural network, covariant discriminant, which are trained using PseAAC based features is first analyzed. Classifiers are developed using same learning mechanism but trained on PseAAC based feature spaces of varying dimensions. These classifiers are combined through voting strategy and an improvement in prediction performance is achieved. Prediction performance is further enhanced by developing CE-PLoc through the combination of different learning mechanisms trained on both DC based feature space and PseAAC based feature spaces of varying dimensions. The predictive performance of proposed CE-PLoc is evaluated for two benchmark datasets of protein subcellular locations using accuracy, MCC, and Q-statistics. Using the jackknife test, prediction accuracies of 81.47 and 83.99% are obtained for 12 and 14 subcellular locations datasets, respectively. In case of independent dataset test, prediction accuracies are 87.04 and 87.33% for 12 and 14 class datasets, respectively. [Copyright &y& Elsevier]
- Published
- 2011
- Full Text
- View/download PDF
19. LAB-Secretome: a genome-scale comparative analysis of the predicted extracellular and surface-associated proteins of Lactic Acid Bacteria.
- Author
-
Zhou, Miaomiao, Theunissen, Daniel, Wels, Michiel, and Siezen, Roland J
- Subjects
- *
LACTIC acid bacteria , *BACTERIAL genomes , *MOLECULAR evolution , *COMPARATIVE studies , *COMPARATIVE genomics , *PROTEINS , *CELL metabolism - Abstract
Background: In Lactic Acid Bacteria (LAB), the extracellular and surface-associated proteins can be involved in processes such as cell wall metabolism, degradation and uptake of nutrients, communication and binding to substrates or hosts. A genome-scale comparative study of these proteins (secretomes) can provide vast information towards the understanding of the molecular evolution, diversity, function and adaptation of LAB to their specific environmental niches. Results: We have performed an extensive prediction and comparison of the secretomes from 26 sequenced LAB genomes. A new approach to detect homolog clusters of secretome proteins (LaCOGs) was designed by integrating protein subcellular location prediction and homology clustering methods. The initial clusters were further adjusted semi-manually based on multiple sequence alignments, domain compositions, pseudogene analysis and biological function of the proteins. Ubiquitous protein families were identified, as well as species-specific, strain-specific, and niche-specific LaCOGs. Comparative analysis of protein subfamilies has shown that the distribution and functional specificity of LaCOGs could be used to explain many niche-specific phenotypes. A comprehensive and user-friendly database LAB-Secretome was constructed to store, visualize and update the extracellular proteins and LaCOGs http://www.cmbi.ru.nl/lab%5fsecretome/. This database will be updated regularly when new bacterial genomes become available. Conclusions: The LAB-Secretome database could be used to understand the evolution and adaptation of lactic acid bacteria to their environmental niches, to improve protein functional annotation and to serve as basis for targeted experimental studies. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
20. Predicting protein subcellular location: exploiting amino acid based sequence of feature spaces and fusion of diverse classifiers.
- Author
-
Khan, Asifullah, Majid, Abdul, and Tae-Sun Choi
- Subjects
- *
AMINO acids , *AMINO compounds , *PROTEINS , *JACKKNIFE (Statistics) , *ORGANIC acids - Abstract
A novel approach CE-Ploc is proposed for predicting protein subcellular locations by exploiting diversity both in feature and decision spaces. The diversity in a sequence of feature spaces is exploited using hydrophobicity and hydrophilicity of amphiphilic pseudo amino acid composition and a specific learning mechanism. Diversity in learning mechanisms is exploited by fusion of classifiers that are based on different learning mechanisms. Significant improvement in prediction performance is observed using jackknife and independent dataset tests. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
21. Prediction of protein subcellular location using a combined feature of sequence
- Author
-
Gao, Qing-Bin, Wang, Zheng-Zhi, Yan, Chun, and Du, Yao-Hua
- Subjects
- *
AMINO acids , *BIOMOLECULES , *EUKARYOTIC cells , *PROTEINS - Abstract
Abstract: To understand the structure and function of a protein, an important task is to know where it occurs in the cell. Thus, a computational method for properly predicting the subcellular location of proteins would be significant in interpreting the original data produced by the large-scale genome sequencing projects. The present work tries to explore an effective method for extracting features from protein primary sequence and find a novel measurement of similarity among proteins for classifying a protein to its proper subcellular location. We considered four locations in eukaryotic cells and three locations in prokaryotic cells, which have been investigated by several groups in the past. A combined feature of primary sequence defined as a 430D (dimensional) vector was utilized to represent a protein, including 20 amino acid compositions, 400 dipeptide compositions and 10 physicochemical properties. To evaluate the prediction performance of this encoding scheme, a jackknife test based on nearest neighbor algorithm was employed. The prediction accuracies for cytoplasmic, extracellular, mitochondrial, and nuclear proteins in the former dataset were 86.3%, 89.2%, 73.5% and 89.4%, respectively, and the total prediction accuracy reached 86.3%. As for the prediction accuracies of cytoplasmic, extracellular, and periplasmic proteins in the latter dataset, the prediction accuracies were 97.4%, 86.0%, and 79.7, respectively, and the total prediction accuracy of 92.5% was achieved. The results indicate that this method outperforms some existing approaches based on amino acid composition or amino acid composition and dipeptide composition. [Copyright &y& Elsevier]
- Published
- 2005
- Full Text
- View/download PDF
22. Prediction of protein subcellular locations by GO–FunD–PseAA predictor
- Author
-
Chou, Kuo-Chen and Cai, Yu-Dong
- Subjects
- *
PROTEINS , *ONTOLOGY , *AMINO acids , *BIOINFORMATICS - Abstract
The localization of a protein in a cell is closely correlated with its biological function. With the explosion of protein sequences entering into DataBanks, it is highly desired to develop an automated method that can fast identify their subcellular location. This will expedite the annotation process, providing timely useful information for both basic research and industrial application. In view of this, a powerful predictor has been developed by hybridizing the gene ontology approach [Nat. Genet. 25 (2000) 25], functional domain composition approach [J. Biol. Chem. 277 (2002) 45765], and the pseudo-amino acid composition approach [Proteins Struct. Funct. Genet. 43 (2001) 246; Erratum: ibid. 44 (2001) 60]. As a showcase, the recently constructed dataset [Bioinformatics 19 (2003) 1656] was used for demonstration. The dataset contains 7589 proteins classified into 12 subcellular locations: chloroplast, cytoplasmic, cytoskeleton, endoplasmic reticulum, extracellular, Golgi apparatus, lysosomal, mitochondrial, nuclear, peroxisomal, plasma membrane, and vacuolar. The overall success rate of prediction obtained by the jackknife cross-validation was 92%. This is so far the highest success rate performed on this dataset by following an objective and rigorous cross-validation procedure. [Copyright &y& Elsevier]
- Published
- 2004
- Full Text
- View/download PDF
23. A new hybrid approach to predict subcellular localization of proteins by incorporating gene ontology
- Author
-
Chou, Kuo-Chen and Cai, Yu-Dong
- Subjects
- *
GENES , *PROTEINS , *AMINO acids , *ALGORITHMS - Abstract
Based on the recent development in the gene ontology and functional domain databases, a new hybridization approach is developed for predicting protein subcellular location by combining the gene product, functional domain, and quasi-sequence-order effects. As a showcase, the same prokaryotic and eukaryotic datasets, which were studied by many previous investigators, are used for demonstration. The overall success rate by the jackknife test for the prokaryotic set is 94.7% and that for the eukaryotic set 92.9%. These are so far the highest success rates achieved for the two datasets by following a rigorous cross-validation test procedure, suggesting that such a hybrid approach may become a very useful high-throughput tool in the area of bioinformatics, proteomics, as well as molecular cell biology. The very high success rates also reflect the fact that the subcellular localization of a protein is closely correlated with: (1) the biological objective to which the gene or gene product contributes, (2) the biochemical activity of a gene product, and (3) the place in the cell where a gene product is active. [Copyright &y& Elsevier]
- Published
- 2003
- Full Text
- View/download PDF
24. Use of correspondence discriminant analysis to predict the subcellular location of bacterial proteins
- Author
-
Perrière, Guy and Thioulouse, Jean
- Subjects
- *
DISCRIMINANT analysis , *BACTERIA , *PROTEINS - Abstract
Correspondence discriminant analysis (CDA) is a multivariate statistical method derived from discriminant analysis which can be used on contingency tables. We have used CDA to separate Gram negative bacteria proteins according to their subcellular location. The high resolution of the discrimination obtained makes this method a good tool to predict subcellular location when this information is not known. The main advantage of this technique is its simplicity. Indeed, by computing two linear formulae on amino acid composition, it is possible to classify a protein into one of the three classes of subcellular location we have defined. The CDA itself can be computed with the ADE-4 software package that can be downloaded, as well as the data set used in this study, from the Poˆle Bio-Informatique Lyonnais (PBIL) server at http://pbil.univ-lyon1.fr. [Copyright &y& Elsevier]
- Published
- 2003
- Full Text
- View/download PDF
25. Image-based classification of protein subcellular location patterns in human reproductive tissue by ensemble learning global and local features
- Author
-
Ying-Ying Xu, Shitong Wang, Fan Yang, and Hong-Bin Shen
- Subjects
Computer science ,Local binary patterns ,business.industry ,Cognitive Neuroscience ,Feature extraction ,Human Protein Atlas ,Pattern recognition ,Feature selection ,Subcellular localization ,Ensemble learning ,Computer Science Applications ,Support vector machine ,Artificial Intelligence ,Protein subcellular location ,Artificial intelligence ,business ,Classifier (UML) - Abstract
The reproductive system is a specific system of organs working together for the purpose of reproduction. As one of the most significant characteristics of human cell, subcellular localization plays a critical role for understanding specific functions of mammalian proteins. In this study, we developed a novel computational protocol for predicting protein subcellular locations from microscope images of cells in human reproductive tissues. Three major steps are contained in this protocol, i.e., protein object identification, image feature extraction, and classification. We first separated protein and DNA staining in the images with both linear and non-negative matrix factorization separation methods; then we extracted protein multi-view global and local texture features including wavelet Haralick, local binary patterns, local ternary patterns, and the local quinary patterns; finally based on the selected important feature subset, we constructed an ensemble classifier with support vector machines for classifications. Experiments are performed on a benchmark dataset consisting of seven major subcellular classes in human reproductive tissues collected from human protein atlas. Our results show that the local texture pattern features play an important complementary role to global features for enhancing the prediction performance. An overall accuracy of 85% is obtained through current system, and when only confident classifications are considered, the accuracy can reach 99%. It is the first developed image based protein subcellular location predictor specifically for human reproductive tissue. The promising results indicate that the developed protocol can be applied for accurate large-scale protein subcellular localization annotations in human reproductive system.
- Published
- 2014
- Full Text
- View/download PDF
26. Predicting protein subcellular location with network embedding and enrichment features.
- Author
-
Pan, Xiaoyong, Lu, Lin, and Cai, Yu-Dong
- Subjects
- *
RECURRENT neural networks , *ARTIFICIAL neural networks , *DECISION trees , *AMINO acid sequence , *FEATURE selection , *PROTEINS - Abstract
The subcellular location of a protein is highly related to its function. Identifying the location of a given protein is an essential step for investigating its related problems. Traditional experimental methods can produce solid determination. However, their limitations, such as high cost and low efficiency, are evident. Computational methods provide an alternative means to address these problems. Most previous methods constantly extract features from protein sequences or structures for building prediction models. In this study, we use two types of features and combine them to construct the model. The first feature type is extracted from a protein–protein interaction network to abstract the relationship between the encoded protein and other proteins. The second type is obtained from gene ontology and biological pathways to indicate the existing functions of the encoded protein. These features are analyzed using some feature selection methods. The final optimum features are adopted to build the model with recurrent neural network as the classification algorithm. Such model yields good performance with Matthews correlation coefficient of 0.844. A decision tree is used as a rule learning classifier to extract decision rules. Although the performance of decision rules is poor, they are valuable in revealing the molecular mechanism of proteins with different subcellular locations. The final analysis confirms the reliability of the extracted rules. The source code of the propose method is freely available at https://github.com/xypan1232/rnnloc Unlabelled Image • Extract protein embeddings from a protein-protein network using Node2Vec. • Combine the protein embedding with enrichment features derived from functional data. • Train recurrent neural network to classify 16 localizations with multiple steps of feature selection. • Learn classification rules of protein localizations using decision trees. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
27. Human Protein Subcellular Localization with Integrated Source and Multi-label Ensemble Classifier
- Author
-
Xiaotong Guo, Ying Ju, Fulin Liu, Zhen Wang, and Chunyu Wang
- Subjects
0301 basic medicine ,Multidisciplinary ,Subcellular structure ,business.industry ,Biology ,Subcellular localization ,Proteomics ,Machine learning ,computer.software_genre ,Cell function ,Article ,03 medical and health sciences ,030104 developmental biology ,0302 clinical medicine ,Software ,Protein sequencing ,ComputingMethodologies_PATTERNRECOGNITION ,Protein subcellular location ,030220 oncology & carcinogenesis ,Artificial intelligence ,business ,Classifier (UML) ,computer - Abstract
Predicting protein subcellular location is necessary for understanding cell function. Several machine learning methods have been developed for computational prediction of primary protein sequences because wet experiments are costly and time consuming. However, two problems still exist in state-of-the-art methods. First, several proteins appear in different subcellular structures simultaneously, whereas current methods only predict one protein sequence in one subcellular structure. Second, most software tools are trained with obsolete data and the latest new databases are missed. We proposed a novel multi-label classification algorithm to solve the first problem and integrated several latest databases to improve prediction performance. Experiments proved the effectiveness of the proposed method. The present study would facilitate research on cellular proteomics.
- Published
- 2016
28. Using a Novel AdaBoost Algorithm and Chous Pseudo Amino Acid Composition for Predicting Protein Subcellular Localization
- Author
-
Jie Lin and Yan Wang
- Subjects
Sequence ,Computer science ,Computational Biology ,Proteins ,A protein ,General Medicine ,Computational biology ,Subcellular localization ,Biochemistry ,Adaboost algorithm ,Amino acid composition ,Structural Biology ,Protein subcellular location ,Proteins metabolism ,Amino Acids ,Pseudo amino acid composition ,Algorithms - Abstract
For a protein, an important characteristic is its location or compartment in a cell. This is because a protein has to be located in its proper position in a cell to perform its biological functions. Therefore, predicting protein subcellular location is an important and challenging task in current molecular and cellular biology. In this paper, based on AdaBoost.ME algorithm and Chou's PseAAC (pseudo amino acid composition), a new computational method was developed to identify protein subcellular location. AdaBoost.ME is an improved version of AdaBoost algorithm that can directly extend the original AdaBoost algorithm to deal with multi-class cases without the need to reduce it to multiple two-class problems. In some previous studies the conventional amino acid composition was applied to represent protein samples. In order to take into account the sequence order effects, in this study we use Chou's PseAAC to represent protein samples. To demonstrate that AdaBoost.ME is a robust and efficient model in predicting protein subcellular locations, the same protein dataset used by Cedano et al. (Journal of Molecular Biology, 1997, 266: 594-600) is adopted in this paper. It can be seen from the computed results that the accuracy achieved by our method is better than those by the methods developed by the previous investigators.
- Published
- 2011
- Full Text
- View/download PDF
29. Model building and intelligent acquisition with application to protein subcellular location classification
- Author
-
Estelle Glory-Afshar, Robert F. Murphy, Jelena Kovacevic, and C. Jackson
- Subjects
Statistics and Probability ,Cell type ,Computer science ,Cells ,computer.software_genre ,Models, Biological ,Biochemistry ,Software ,Protein subcellular location ,Molecular Biology ,Likelihood Functions ,Photobleaching ,business.industry ,Process (computing) ,Proteins ,Original Papers ,Cellular Structures ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,Data mining ,business ,Model building ,computer ,Algorithms - Abstract
Motivation: We present a framework and algorithms to intelligently acquire movies of protein subcellular location patterns by learning their models as they are being acquired, and simultaneously determining how many cells to acquire as well as how many frames to acquire per cell. This is motivated by the desire to minimize acquisition time and photobleaching, given the need to build such models for all proteins, in all cell types, under all conditions. Our key innovation is to build models during acquisition rather than as a post-processing step, thus allowing us to intelligently and automatically adapt the acquisition process given the model acquired. Results: We validate our framework on protein subcellular location classification, and show that the combination of model building and intelligent acquisition results in time and storage savings without loss of classification accuracy, or alternatively, higher classification accuracy for the same total acquisition time. Availability and implementation: The data and software used for this study will be made available upon publication at http://murphylab.web.cmu.edu/software and http://www.andrew.cmu.edu/user/jelenak/Software. Contact: jelenak@cmu.edu
- Published
- 2011
- Full Text
- View/download PDF
30. Vaxign: The First Web-Based Vaccine Design Program for Reverse Vaccinology and Applications for Vaccine Development
- Author
-
Zuoshuang Xiang, Yongqun He, and Harry L. T. Mobley
- Subjects
Health, Toxicology and Mutagenesis ,lcsh:Biotechnology ,lcsh:Medicine ,Computational biology ,lcsh:Chemical technology ,lcsh:Technology ,Genome ,Epitope ,Epitopes ,User-Computer Interface ,Protein subcellular location ,lcsh:TP248.13-248.65 ,MHC class I ,Genetics ,Uropathogenic Escherichia coli ,lcsh:TP1-1185 ,Molecular Biology ,Internet ,Vaccines ,biology ,Methodology Report ,lcsh:T ,Histocompatibility Antigens Class I ,Reverse vaccinology ,lcsh:R ,Reproducibility of Results ,Design systems ,Sequence Analysis, DNA ,General Medicine ,Virology ,Transmembrane domain ,ROC Curve ,biology.protein ,Molecular Medicine ,Algorithms ,Genome, Bacterial ,Software ,Bacterial Outer Membrane Proteins ,Biotechnology - Abstract
Vaxign is the first web-based vaccine design system that predicts vaccine targets based on genome sequences using the strategy of reverse vaccinology. Predicted features in the Vaxign pipeline include protein subcellular location, transmembrane helices, adhesin probability, conservation to human and/or mouse proteins, sequence exclusion from genome(s) of nonpathogenic strain(s), and epitope binding to MHC class I and class II. The precomputed Vaxign database contains prediction of vaccine targets for genomes. Vaxign also performs dynamic vaccine target prediction based on input sequences. To demonstrate the utility of this program, the vaccine candidates against uropathogenic Escherichia coli (UPEC) were predicted using Vaxign and compared with various experimental studies. Our results indicate that Vaxign is an accurate and efficient vaccine design program.
- Published
- 2010
- Full Text
- View/download PDF
31. Predicting Protein Subcellular Location Using Chous Pseudo Amino Acid Composition and Improved Hybrid Approach
- Author
-
Feng-Min Li and Qian-Zhong Li
- Subjects
Cell ,Intracellular Space ,Biochemistry ,Sequence Analysis, Protein ,Structural Biology ,Protein subcellular location ,Plant Cells ,Jackknife test ,medicine ,Pseudo amino acid composition ,Plant Proteins ,chemistry.chemical_classification ,Computational Biology ,Proteins ,food and beverages ,A protein ,General Medicine ,Plants ,Hybrid approach ,Sequence identity ,Amino acid ,Eukaryotic Cells ,medicine.anatomical_structure ,chemistry ,Algorithms - Abstract
The location of a protein in a cell is closely correlated with its biological function. Based on the concept that the protein subcellular location is mainly determined by its amino acid and pseudo amino acid composition (PseAA), a new algorithm of increment of diversity combined with support vector machine is proposed to predict the protein subcellular location. The subcellular locations of plant and non-plant proteins are investigated by our method. The overall prediction accuracies in jackknife test are 88.3% for the eukaryotic plant proteins and 92.4% for the eukaryotic non-plant proteins, respectively. In order to estimate the effect of the sequence identity on predictive result, the proteins with sequence identity
- Published
- 2008
- Full Text
- View/download PDF
32. A complexity-based method for predicting protein subcellular location
- Author
-
Zheng, Xiaoqi, Liu, Taigang, and Wang, Jun
- Published
- 2009
- Full Text
- View/download PDF
33. Using pseudo amino acid composition to predict protein subcellular location: Approached with Lyapunov index, Bessel function, and Chebyshev filter
- Author
-
Yongsheng Ding, Zheng-De Huang, Shi-Huang Shao, Y. Huang, Kuo-Chen Chou, Y. Gao, and Xuan Xiao
- Subjects
chemistry.chemical_classification ,Lyapunov function ,Organic Chemistry ,Clinical Biochemistry ,Proteins ,Function (mathematics) ,Proteomics ,Models, Biological ,Biochemistry ,Chebyshev filter ,Amino acid ,Protein Transport ,symbols.namesake ,chemistry ,Protein subcellular location ,symbols ,Amino Acid Sequence ,Biological system ,Pseudo amino acid composition ,Bessel function ,Mathematics - Abstract
With the avalanche of new protein sequences we are facing in the post-genomic era, it is vitally important to develop an automated method for fast and accurately determining the subcellular location of uncharacterized proteins. In this article, based on the concept of pseudo amino acid composition (Chou, K.C. Proteins: Structure, Function, and Genetics, 2001, 43: 246-255), three pseudo amino acid components are introduced via Lyapunov index, Bessel function, Chebyshev filter that can be more efficiently used to deal with the chaos and complexity in protein sequences, leading to a higher success rate in predicting protein subcellular location.
- Published
- 2005
- Full Text
- View/download PDF
34. Automated Interpretation of Protein Subcellular Location Patterns: Implications for Early Cancer Detection and Assessment
- Author
-
Robert F. Murphy
- Subjects
Organelles ,Orientation (computer vision) ,business.industry ,General Neuroscience ,Visual examination ,Proteins ,Pattern analysis ,Abnormal cell ,Pattern recognition ,Biology ,Protein subcellular localization prediction ,General Biochemistry, Genetics and Molecular Biology ,Cell biology ,Automation ,History and Philosophy of Science ,Protein subcellular location ,Pattern recognition (psychology) ,Humans ,Artificial intelligence ,Early Cancer Detection ,business ,Cells, Cultured ,HeLa Cells - Abstract
Fluorescence microscopy is a powerful tool for analyzing the subcellular distributions of proteins, but that power has not been fully utilized because most analysis of those distributions has been done by visual examination. This limitation can be overcome using automated pattern recognition methods widely used in other fields. This article summarizes work demonstrating that automated systems can recognize the patterns of major organelles in both two- and three-dimensional images of cultured cells, and that these systems can distinguish similar patterns better than visual examination. The basis for these systems are sets of Subcellular Location Features that capture the essence of subcellular patterns without being sensitive to the extensive variation that occurs in the size, shape, and orientation of cells in microscope images. These features can also be used to make sensitive, statistical comparisons of the distribution of a protein between two conditions, such as in the presence and absence of a drug. The possible use of automated pattern analysis methods for improving detection of abnormal cells in cancerous or precancerous tissues is also discussed.
- Published
- 2004
- Full Text
- View/download PDF
35. Large-Scale Automated Analysis of Location Patterns in Randomly Tagged 3T3 Cells
- Author
-
García Osuna, Elvira, Hua, Juchang, Bateman, Nicholas W., Zhao, Ting, Berget, Peter B., and Murphy, Robert F.
- Published
- 2007
- Full Text
- View/download PDF
36. Robust Numerical Features for Description and Classification of Subcellular Location Patterns in Fluorescence Microscope Images
- Author
-
Gregory Porreca, Meel Velliste, and Robert F. Murphy
- Subjects
Cell type ,Artificial neural network ,business.industry ,Computer science ,A protein ,Protein subcellular localization prediction ,Protein subcellular location ,Signal Processing ,Pattern recognition (psychology) ,Fluorescence microscope ,Computer vision ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Image resolution ,Information Systems - Abstract
The ongoing biotechnology revolution promises a complete understanding of the mechanisms by which cells and tissues carry out their functions. Central to that goal is the determination of the function of each protein that is present in a given cell type, and determining a protein's location within cells is critical to understanding its function. As large amounts of data become available from genome-wide determination of protein subcellular location, automated approaches to categorizing and comparing location patterns are urgently needed. Since sub-cellular location is most often determined using fluorescence microscopy, we have developed automated systems for interpreting the resulting images. We report here improved numeric features for describing such images that are fairly robust to image intensity binning and spatial resolution. We validate these features by using them to train neural networks that accurately recognize all major subcellular patterns with an accuracy higher than any previously reported. Having validated the features by using them for classification, we also demonstrate using them to create Subcellular Location Trees that group similar proteins and provide a systematic framework for describing subcellular location.
- Published
- 2003
- Full Text
- View/download PDF
37. Multilabel learning for protein subcellular location prediction
- Author
-
Guo-Zheng Li, Xiao Wang, Jia-Ming Liu, Xiaohua Hu, and Rui-Wei Zhao
- Subjects
Biomedical Engineering ,Intracellular Space ,Pharmaceutical Science ,Medicine (miscellaneous) ,Bioengineering ,Computational biology ,Biology ,Machine learning ,computer.software_genre ,Models, Biological ,Protein subcellular location ,Artificial Intelligence ,Leverage (statistics) ,Humans ,Multiplex ,Electrical and Electronic Engineering ,Databases, Protein ,Models, Statistical ,Learning problem ,business.industry ,Molecular biophysics ,A protein ,Computational Biology ,Proteins ,Subcellular localization ,Computer Science Applications ,Learning methods ,Artificial intelligence ,business ,computer ,Algorithms ,Biotechnology - Abstract
Protein subcellular localization aims at predicting the location of a protein within a cell using computational methods. Knowledge of subcellular localization of proteins indicates protein functions and helps in identifying drug targets. Prediction of protein subcellular localization is an important but challenging problem, particularly when proteins may simultaneously exist at, or move between, two or more different subcellular location sites. Most of the existing protein subcellular localization methods are only used to deal with the single-location proteins. To better reflect the characteristics of multiplex proteins, we formulate prediction of subcellular localization of multiplex proteins as a multilabel learning problem. We present and compare two multilabel learning approaches, which exploit correlations between labels and leverage label-specific features, respectively, to induce a high quality prediction model. Experimental results on six protein data sets under various organisms show that our described methods achieve significantly higher performance than any of the existing methods. Among the different multilabel learning methods, we find that methods exploiting label correlations performs better than those leveraging label-specific features.
- Published
- 2012
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.