Back to Search Start Over

SulSite-GTB: identification of protein S-sulfenylation sites by fusing multiple feature information and gradient tree boosting.

Authors :
Wang, Minghui
Cui, Xiaowen
Yu, Bin
Chen, Cheng
Ma, Qin
Zhou, Hongyan
Source :
Neural Computing & Applications; Sep2020, Vol. 32 Issue 17, p13843-13862, 20p
Publication Year :
2020

Abstract

Protein cysteine S-sulfenylation is an essential and reversible post-translational modification that plays a crucial role in transcriptional regulation, stress response, cell signaling and protein function. Studies have shown that S-sulfenylation is involved in many human diseases such as cancer, diabetes and arteriosclerosis. However, experimental identification of protein S-sulfenylation sites is generally expensive and time-consuming. In this study, we proposed a new protein S-sulfenylation sites prediction method SulSite-GTB. First, fusion of amino acid composition, dipeptide composition, encoding based on grouped weight, K nearest neighbors, position-specific amino acid propensity, position-weighted amino acid composition and pseudo-position specific score matrix feature extraction to obtain the initial feature space. Secondly, we use the synthetic minority oversampling technique (SMOTE) algorithm to process the class imbalance data, and the least absolute shrinkage and selection operator (LASSO) are employed to remove the redundant and irrelevant features. Finally, the optimal feature subset is input into the gradient tree boosting classifier to predict the S-sulfenylation sites, and the five-fold cross-validation and independent test set method are used to evaluate the prediction performance of the model. Experimental results showed the overall prediction accuracy is 92.86% and 88.53%, respectively, and the AUC values are 0.9706 and 0.9425, respectively, on the training set and the independent test set. Compared with other prediction methods, the results show that the proposed method SulSite-GTB is significantly superior to other state-of-the-art methods and provides a new idea for the prediction of post-translational modification sites of other proteins. The source code and all datasets are available at https://github.com/QUST-AIBBDRC/SulSite-GTB/. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09410643
Volume :
32
Issue :
17
Database :
Complementary Index
Journal :
Neural Computing & Applications
Publication Type :
Academic Journal
Accession number :
145259085
Full Text :
https://doi.org/10.1007/s00521-020-04792-z