Raphael A. Viscarra Rossel, Thorsten Behrens, Eyal Ben‐Dor, Sabine Chabrillat, José Alexandre Melo Demattê, Yufeng Ge, Cecile Gomez, César Guerrero, Yi Peng, Leonardo Ramirez‐Lopez, Zhou Shi, Bo Stenberg, Richard Webster, Leigh Winowiecki, Zefang Shen, School of Earth and Planetary Science [Perth - Curtin university], Curtin University [Perth], Planning and Transport Research Centre (PATREC)-Planning and Transport Research Centre (PATREC), Swiss Competence Center for Soils, Tel Aviv University (TAU), German Research Centre for Geosciences - Helmholtz-Centre Potsdam (GFZ), Leibniz Universität Hannover=Leibniz University Hannover, Universidade de São Paulo = University of São Paulo (USP), University of Nebraska–Lincoln, University of Nebraska System, Laboratoire d'étude des Interactions Sol - Agrosystème - Hydrosystème (UMR LISAH), Institut de Recherche pour le Développement (IRD)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE)-Institut Agro Montpellier, Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro), Institut de Recherche pour le Développement (IRD), Universidad Miguel Hernández [Elche] (UMH), Food and Agriculture Organization of the United Nations [Rome, Italie] (FAO), BÜCHI Labortechnik AG, Partenaires INRAE, Zhejiang University, Swedish University of Agricultural Sciences (SLU), Rothamsted Research, Biotechnology and Biological Sciences Research Council (BBSRC), World Agroforestry Center [CGIAR, Kenya] (ICRAF), Consultative Group on International Agricultural Research [CGIAR] (CGIAR), and Raphael A. Viscarra Rossel received funding from the Australian Government via grant ACSRIV000077.
International audience; Spectroscopic measurements of soil samples are reliable because they are highly repeatable and reproducible. They characterise the samples' mineral-organic composition. Estimates of concentrations of soil constituents are inevitably less precise than estimates obtained conventionally by chemical analysis. But the cost of each spectroscopic estimate is at most one-tenth of the cost of a chemical determination. Spectroscopy is cost-effective when we need many data, despite the costs and errors of calibration. Soil spectroscopists understand the risks of over-fitting models to highly dimensional multivariate spectra and have command of the mathematical and statistical methods to avoid them. Machine learning has fast become an algorithmic alternative to statistical analysis for estimating concentrations of soil constituents from reflectance spectra. As with any modelling, we need judicious implementation of machine learning as it also carries the risk of over-fitting predictions to irrelevant elements of the spectra. To use the methods confidently, we need to validate the outcomes with appropriately sampled, independent data sets. Not all machine learning should be considered 'black boxes'. Their interpretability depends on the algorithm, and some are highly interpretable and explainable. Some are difficult to interpret because of complex transformations or their huge and complicated network of parameters. But there is rapidly advancing research on explainable machine learning, and these methods are finding applications in soil science and spectroscopy. In many parts of the world, soil and environmental scientists recognise the merits of soil spectroscopy. They are building spectral libraries on which they can draw to localise the modelling and derive soil information for new projects within their domains. We hope our article gives readers a more balanced and optimistic perspective of soil spectroscopy and its future. Highlights Spectroscopy is reliable because it is a highly repeatable and reproducible analytical technique. Spectra are calibrated to estimate concentrations of soil properties with known error. Spectroscopy is cost-effective for estimating soil properties. Machine learning is becoming ever more powerful for extracting accurate information from spectra, and methods for interpreting the models exist. Large libraries of soil spectra provide information that can be used locally to aid estimates from new samples.