1. Prediction of drug-target interaction based on protein features using undersampling and feature selection techniques with boosting.
- Author
-
Mahmud, S.M. Hasan, Chen, Wenyu, Meng, Han, Jahan, Hosney, Liu, Yongsheng, and Hasan, S.M. Mamun
- Subjects
- *
FEATURE selection , *PROTEIN-protein interactions , *BOOSTING algorithms , *AMINO acid sequence , *MOLECULAR structure , *DESCRIPTOR systems , *PROTEIN content of food - Abstract
Accurate identification of drug-target interaction (DTI) is a crucial and challenging task in the drug discovery process, having enormous benefit to the patients and pharmaceutical company. The traditional wet-lab experiments of DTI is expensive, time-consuming, and labor-intensive. Therefore, many computational techniques have been established for this purpose; although a huge number of interactions are still undiscovered. Here, we present pdti-EssB, a new computational model for identification of DTI using protein sequence and drug molecular structure. More specifically, each drug molecule is transformed as the molecular substructure fingerprint. For a protein sequence, different descriptors are utilized to represent its evolutionary, sequence, and structural information. Besides, our proposed method uses data balancing techniques to handle the imbalance problem and applies a novel feature eliminator to extract the best optimal features for accurate prediction. In this paper, four classes of DTI benchmark datasets are used to construct a predictive model with XGBoost. Here, the auROC is utilized as an evaluation metric to compare the performance of pdti-EssB method with recent methods, applying five-fold cross-validation. Finally, the experimental results indicate that our proposed method is able to outperform other approaches in predicting DTI, and introduces new drug-target interaction samples based on prediction probability scores. pdti-EssB webserver is available online at http://pdtiessb-uestc.com/ Image 1 • Computational model pdti-EssB is proposed for predicting Drug-Target Interactions (DTIs). • MSF, PSSM-Bigram, PseAAC and SPIDER2 are used for extracting Drug-target features. • CUS and EnsRFS techniques are proposed for resolving class imbalance problem and high dimensionality of data. • Boosting algorithm is utilized as a classifier. • Achieves the best prediction performance and can effectively predict the potential DTIs. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF