Back to Search Start Over

Spatial or Random Cross-Validation? The Effect of Resampling Methods in Predicting Groundwater Salinity with Machine Learning in Mediterranean Region.

Authors :
Tziachris, Panagiotis
Nikou, Melpomeni
Aschonitis, Vassilis
Kallioras, Andreas
Sachsamanoglou, Katerina
Fidelibus, Maria Dolores
Tziritis, Evangelos
Source :
Water (20734441); Jun2023, Vol. 15 Issue 12, p2278, 18p
Publication Year :
2023

Abstract

Machine learning (ML) algorithms are extensively used with outstanding prediction accuracy. However, in some cases, their overfitting capabilities, along with inadvertent biases, might produce overly optimistic results. Spatial data are a special kind of data that could introduce biases to ML due to their intrinsic spatial autocorrelation. To address this issue, a special resampling method has emerged called spatial cross-validation (SCV). The purpose of this study was to evaluate the performance of SCV compared with conventional random cross-validation (CCV) used in most ML studies. Multiple ML models were created with CCV and SCV to predict groundwater electrical conductivity (EC) with data (A) from Rhodope, Greece, in the summer of 2020; (B) from the same area but at a different time (summer 2019); and (C) from a new area (the Salento peninsula, Italy). The results showed that the SCV provides ML models with superior generalization capabilities and, hence, better prediction results in new unknown data. The SCV seems to be able to capture the spatial patterns in the data while also reducing the over-optimism bias that is often associated with CCV methods. Based on the results, SCV could be applied with ML in studies that use spatial data. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
20734441
Volume :
15
Issue :
12
Database :
Complementary Index
Journal :
Water (20734441)
Publication Type :
Academic Journal
Accession number :
164684497
Full Text :
https://doi.org/10.3390/w15122278