1. Extension of pQSAR: Ensemble Model Generated by Random Forest and Partial Least Squares Regressions
- Author
-
Byung Chun Kim, Dosang Joe, Youngho Woo, Yongkuk Kim, and Gangjoon Yoon
- Subjects
Bio-activity prediction ,drug discovery ,fingerprint ,optimization ,QSAR ,similar property principle ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Quantitative structure-activity relationship (QSAR) regression models are mathematical ones which relate the structural properties of chemicals to the potencies of the biological activities of the chemicals. In QSAR models, the physical and chemical information of the molecules is encoded into quantitative numbers called descriptors. Recently, experimental test results (profiles) have been used as descriptors of chemicals. Profile QSAR 2.0 (pQSAR) model suggested by Martin et al., is a multitask, two step machine learning prediction method with a combination of random forest regressions (RFRs) and partial least squares regression (PLSR). In pQSAR model, one fills the profile table's missing values with RFRs and then builds PLSR using the profile predictions. Note that in the second step of the pQSAR method, PLSR's predictor variables are profiles; so activity values, and the response variables are also activity values. Thus we can use the PLSRs to update the profile table and then repeat the second step. In this work, we propose an extended model of pQSAR generated by RFRs and PLSRs. Experiment of updating the given full initially predicted profile table by two kinds of prediction models, RFRs and PLSRs, has been conducted iteratively for the PKIS and ChEMBL data sets. Even though prediction performance of individual combination of RFRs and PLSRs varies, the average of the all possible predicted profile tables for given iteration shows better performance. This ensemble model has better prediction performance in sense of Pearson's R2 compared to that of the pQSAR model.
- Published
- 2020
- Full Text
- View/download PDF