Back to Search Start Over

PrUb-EL: A hybrid framework based on deep learning for identifying ubiquitination sites in Arabidopsis thaliana using ensemble learning strategy.

Authors :
Wang, Houqiang
Li, Hong
Gao, Weifeng
Xie, Jin
Source :
Analytical Biochemistry. Dec2022, Vol. 658, pN.PAG-N.PAG. 1p.
Publication Year :
2022

Abstract

Identification of ubiquitination sites is central to many biological experiments. Ubiquitination is a kind of post-translational protein modification (PTM). It is a key mechanism for increasing protein diversity and plays a vital role in regulating cell function. In recent years, many models have been developed to predict ubiquitination sites in humans, mice and yeast. However, few studies have predicted ubiquitination sites in Arabidopsis thaliana. In view of this, a deep network model named PrUb-EL is proposed to predict ubiquitination sites in Arabidopsis thaliana. Firstly, six features based on the protein sequence are extracted with amino acid index database (AAindex), dipeptide deviates from the expected mean (DDE), dipeptide composition (DPC), blocks substitution matrix (BLOSUM62), enhanced amino acid composition (EAAC) and binary encoding. Secondly, the synthetic minority over-sampling technique (SMOTE) is utilized to process the imbalanced data set. Then a new classifier named DG is presented, which includes Dense block, Residual block and Gated recurrent unit (GRU) block. Finally, each of six feature extraction methods is integrated into the DG model, and the ensemble learning strategy is used to gain the final prediction result. Experimental results show that PrUb-EL has good predictive ability with the accuracy (ACC) and area under the ROC curve (auROC) values of 91.00% and 97.70% using 5-fold cross-validation, respectively. Note that the values of ACC and auROC are 88.58% and 96.09% in the independent test, respectively. Compared with previous studies, our model has significantly improved performance thus it is an excellent method for identifying ubiquitination sites in Arabidopsis thaliana. The datasets and code used for the article are available at https://github.com/Tom-Wangy/PreUb-EL.git. [Display omitted] • A new model is proposed to predict ubiquitination sites with high accuracy. • AAindex, DDE, DPC, BLOSUM62, EAAC and Binary are applied to extract features from the datasets. • The SMOTE oversampling method is utilized to process the imbalanced data set. • A new classifier named DG is presented, which includes Dese block, Residual block and GRU block. • Six feature extraction methods are adpoted for base classifiers, and ensemble learning strategy is used for final result. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00032697
Volume :
658
Database :
Academic Search Index
Journal :
Analytical Biochemistry
Publication Type :
Academic Journal
Accession number :
159755308
Full Text :
https://doi.org/10.1016/j.ab.2022.114935