1. Spatial dependence between training and test sets: another pitfall of classification accuracy assessment in remote sensing
- Author
-
N. Karasiak, Claude Monteil, Jean-François Dejoux, David Sheeren, Dynamiques et écologie des paysages agriforestiers (DYNAFOR), École nationale supérieure agronomique de Toulouse [ENSAT]-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Centre d'études spatiales de la biosphère (CESBIO), Centre National de la Recherche Scientifique (CNRS)-Institut de Recherche pour le Développement (IRD)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Institut national des sciences de l'Univers (INSU - CNRS)-Observatoire Midi-Pyrénées (OMP), Météo France-Centre National d'Études Spatiales [Toulouse] (CNES)-Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut de Recherche pour le Développement (IRD)-Météo France-Centre National d'Études Spatiales [Toulouse] (CNES)-Centre National de la Recherche Scientifique (CNRS), and French Ministry of Higher Education and Research (University of Toulouse)
- Subjects
Contextual image classification ,Pixel ,Property (programming) ,Computer science ,Overfitting ,Cross-validation ,02 engineering and technology ,Remote sensing ,Python (programming language) ,Accuracy assessment ,Artificial Intelligence ,Sample size determination ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,[INFO]Computer Science [cs] ,020201 artificial intelligence & image processing ,Spatial dependence ,computer ,Spatial analysis ,Spatial autocorrelation ,Software ,Independence (probability theory) ,computer.programming_language - Abstract
International audience; Spatial autocorrelation is inherent to remotely sensed data. Nearby pixels are more similar than distant ones. This property can help to improve the classification performance, by adding spatial or contextual features into the model. However, it can also lead to overestimation of generalisation capabilities, if the spatial dependence between training and test sets is ignored. In this paper, we review existing approaches that deal with spatial autocorrelation for image classification in remote sensing and demonstrate the importance of bias in accuracy metrics when spatial independence between the training and test sets is not respected. We compare three spatial and non-spatial cross-validation strategies at pixel and object levels and study how performances vary at different sample sizes. Experiments based on Sentinel-2 data for mapping two simple forest classes show that spatial leave-one-out cross-validation is the better strategy to provide unbiased estimates of predictive error. Its performance metrics are consistent with the real quality of the resulting map contrary to traditional non-spatial cross-validation that overestimates accuracy. This highlight the need to change practices in classification accuracy assessment. To encourage it we developped Museo ToolBox, an open-source python library that makes spatial cross-validation possible.
- Published
- 2021
- Full Text
- View/download PDF