Back to Search Start Over

Improved environmental mapping and validation using bagging models with spatially clustered data.

Authors :
Misiuk, Benjamin
Brown, Craig J.
Source :
Ecological Informatics; Nov2023, Vol. 77, pN.PAG-N.PAG, 1p
Publication Year :
2023

Abstract

Spatially clustered sampling may result in non-independent data that pose challenges for environmental mapping applications. Two outstanding challenges resulting from the use of spatially clustered data for predictive geospatial modelling with machine learning approaches are biased model training and validation. These issues can be severe for popular bagging models such as Random Forest, yet one or both are often ignored or are handled using sub-optimal approaches. We propose to address these challenges using information on both the spatial autocorrelation of map errors and the spatial sampling intensity. This is achieved by applying the residual spatial covariance as a weighting function for the bagging procedure and for the calculation of weighted validation statistics. Using this approach, the full feature space of the sample data is retained during model training and validation. The utility of covariance weighting for these purposes is investigated through extensive simulation with a range of sample clustering configurations. Results are benchmarked against existing approaches. Covariance weighting improved model performance across a range of clustering scenarios but appeared to produce the greatest improvements for highly clustered data. Covariance-weighted validation demonstrated low bias across a broad range of clustering scenarios compared to existing spatial methods. Findings also suggest, though, that conditional Gaussian simulation approaches may perform well when the proportion of clustered data is very high. Covariance weighting is straightforward to implement, computationally efficient, and scales to different sample sizes and spatial extents. • Residual spatial covariance is calculated as weights for clustered data. • Weighted bagging with Random Forest reduced training bias when using clustered data. • Weighted validation produced low bias under a range of clustering scenarios. • Covariance weighting uses both autocorrelation and sample intensity information. • Covariance weighting enables use of the full feature space during validation. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
15749541
Volume :
77
Database :
Supplemental Index
Journal :
Ecological Informatics
Publication Type :
Academic Journal
Accession number :
171879421
Full Text :
https://doi.org/10.1016/j.ecoinf.2023.102181