Back to Search Start Over

Application of a developed triple-classification machine learning model for carcinogenic prediction of hazardous organic chemicals to the US, EU, and WHO based on Chinese database.

Authors :
Hao, Ning
Sun, Peixuan
Zhao, Wenjin
Li, Xixi
Source :
Ecotoxicology & Environmental Safety; Apr2023, Vol. 255, pN.PAG-N.PAG, 1p
Publication Year :
2023

Abstract

Cancer, the second largest human disease, has become a major public health problem. The prediction of chemicals' carcinogenicity before their synthesis is crucial. In this paper, seven machine learning algorithms (i.e., Random Forest (RF), Logistic Regression (LR), Support Vector Machines (SVM), Complement Naive Bayes (CNB), K-Nearest Neighbor (KNN), XGBoost, and Multilayer Perceptron (MLP)) were used to construct the carcinogenicity triple classification prediction (TCP) model (i.e., 1A, 1B, Category 2). A total of 1444 descriptors of 118 hazardous organic chemicals were calculated by Discovery Studio 2020, Sybyl X-2.0 and PaDEL-Descriptor software. The constructed carcinogenicity TCP model was evaluated through five model evaluation indicators (i.e., Accuracy, Precision, Recall, F1 Score and AUC). The model evaluation results show that Accuracy, Precision, Recall, F1 Score and AUC evaluation indicators meet requirements (greater than 0.6). The accuracy of RF, LR, XGBoost, and MLP models for predicting carcinogenicity of Category 2 is 91.67%, 79.17%, 100%, and 100%, respectively. In addition, the constructed machine learning model in this study has potential for error correction. Taking XGBoost model as an example, the predicted carcinogenicity level of 1,2,3-Trichloropropane (96−18−4) is Category 2, but the actual carcinogenicity level is 1B. But the difference between Category 2 and 1B is only 0.004, indicating that the XGBoost is one optimum model of the seven constructed machine learning models. Besides, results showed that functional groups like chlorine and benzene ring might influence the prediction of carcinogenic classification. Therefore, considering functional group characteristics of chemicals before constructing the carcinogenicity prediction model of organic chemicals is recommended. The predicted carcinogenicity of the organic chemicals using the optimum machine leaning model (i.e., XGBoost) was also evaluated and verified by the toxicokinetics. The RF and XGBoost TCP models constructed in this paper can be used for carcinogenicity detection before synthesizing new organic substances. It also provides technical support for the subsequent management of organic chemicals. • Development of a triple-classification model using 7 machine learning algorithms. • First triple-classification model for carcinogenicity classification prediction. • Verification of triple-classification model by US, EU, and WHO databases. • RF and XGBoost models as wide applications of carcinogenicity prediction. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
01476513
Volume :
255
Database :
Supplemental Index
Journal :
Ecotoxicology & Environmental Safety
Publication Type :
Academic Journal
Accession number :
162894981
Full Text :
https://doi.org/10.1016/j.ecoenv.2023.114806