Back to Search Start Over

Synthetic resampling strategies and machine learning for digital soil mapping in Iran.

Authors :
Taghizadeh‐Mehrjardi, Ruhollah
Schmidt, Karsten
Eftekhari, Kamran
Behrens, Thorsten
Jamshidi, Mohammad
Davatgar, Naser
Toomanian, Norair
Scholten, Thomas
Source :
European Journal of Soil Science; May2020, Vol. 71 Issue 3, p352-368, 17p
Publication Year :
2020

Abstract

Most common machine learning (ML) algorithms usually work well on balanced training sets, that is, datasets in which all classes are approximately represented equally. Otherwise, the accuracy estimates may be unreliable and classes with only a few values are often misclassified or neglected. This is known as a class imbalance problem in machine learning and datasets that do not meet this criterion are referred to as imbalanced data. Most datasets of soil classes are, therefore, imbalanced data. One of our main objectives is to compare eight resampling strategies that have been developed to counteract the imbalanced data problem. We compared the performance of five of the most common ML algorithms with the resampling approaches. The highest increase in prediction accuracy was achieved with SMOTE (the synthetic minority oversampling technique). In comparison to the baseline prediction on the original dataset, we achieved an increase of about 10, 20 and 10% in the overall accuracy, kappa index and F‐score, respectively. Regarding the ML approaches, random forest (RF) showed the best performance with an overall accuracy, kappa index and F‐score of 66, 60 and 57%, respectively. Moreover, the combination of RF and SMOTE improved the accuracy of the individual soil classes, compared to RF trained on the original dataset and allowed better prediction of soil classes with a low number of samples in the corresponding soil profile database, in our case for Chernozems. Our results show that balancing existing soil legacy data using synthetic sampling strategies can significantly improve the prediction accuracy in digital soil mapping (DSM). Highlights: Spatial distribution of soil classes in Iran can be predicted using machine learning (ML) algorithms.The synthetic minority oversampling technique overcomes the drawback of imbalanced and highly biased soil legacy data.When combining a random forest model with synthetic sampling strategies the prediction accuracy of the soil model improves significantly.The resulting new soil map of Iran has a much higher spatial resolution compared to existing maps and displays new soil classes that have not yet been mapped in Iran. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
13510754
Volume :
71
Issue :
3
Database :
Complementary Index
Journal :
European Journal of Soil Science
Publication Type :
Academic Journal
Accession number :
143072160
Full Text :
https://doi.org/10.1111/ejss.12893