Back to Search Start Over

HPO-empowered machine learning with multiple environment variables enables spatial prediction of soil heavy metals in coastal delta farmland of China.

Authors :
Song, Yingqiang
Zhan, Dexi
He, Zhenxin
Li, Wenhui
Duan, Wenxu
Yang, Zhongkang
Lu, Miao
Source :
Computers & Electronics in Agriculture. Oct2023, Vol. 213, pN.PAG-N.PAG. 1p.
Publication Year :
2023

Abstract

[Display omitted] • A machine learning method assisted by auto hyperparameter optimization was developed. • TPE-XGBoost model has the best performance for spatial prediction of soil heavy metals. • Air quality and hyperspectral variables have the high contributions for prediction accuracy. • There is a source-receptor coupling path for the accumulation of soil heavy metals. • Soil heavy metals with high concentrations are concentrated around rivers. Machine learning (ML) models have been widely used for predicting spatial variability of soil heavy metals. However, it is impossible to explore the entire hyperparameter space of ML models by artificially trial-and-error experimentation. Here, an auto hyperparameter optimization-based machine learning (HPO-ML) method with three search algorithms and random forest (RF) and extreme gradient boosting (XGBoost) models was developed to predict the heavy metal content in soil with multiple environmental variables. The tree-structured Parzen estimator (TPE) algorithm outperformed other search algorithms in identifying the optimal hyperparameters of RF and XGBoost models. The model prediction results showed that the TPE-XGBoost had the highest accuracy for predicting the As (RMSE = 3.06 mg kg−1 and R2 = 70.35%), Cd (RMSE = 0.10 mg kg−1 and R2 = 75.43%), Cr (RMSE = 13.86 mg kg−1 and R2 = 82.11%), Ni (RMSE = 3.19 mg kg−1 and R2 = 75.20%), Pb (RMSE = 3.75 mg kg−1 and R2 = 74.79%), and Zn (RMSE = 6.83 mg kg−1 and R2 = 70.05%) contents. The TPE-XGBoost mapping result showed that areas with high concentrations of soil heavy metals were concentrated in the central and eastern areas (As), the mainstream of the Yellow River (Cd), the northeast area (Cr), the ancient watercourse of the Yellow River (Ni and Pb), and the central and northeastern areas (Zn). The SHapley additive explanation (SHAP) and structural equation model (SEM) were used to interpret the drivers of environmental variables. It is found that the variables with the highest contributions were CO, PM 2.5 , O 3 , PC3, PC1, and PC4 for predicting the As, Cd, Cr, Ni, Pb, and Zn contents, respectively, and there was a significant source-receptor coupling path. The results demonstrate the feasibility of using the HPO-ML approach in hyperparameter-limited conditions, which providing data-driven pathways and options to support the high-quality development of agriculture and the protection of farmland soil ecosystem. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
01681699
Volume :
213
Database :
Academic Search Index
Journal :
Computers & Electronics in Agriculture
Publication Type :
Academic Journal
Accession number :
172844808
Full Text :
https://doi.org/10.1016/j.compag.2023.108254