1. Forecasting Plantago pollen: improving feature selection through random forests, clustering, and Friedman tests.
- Author
-
Navares, Ricardo and Aznarte, José Luis
- Subjects
FEATURE selection ,POLLEN ,PLANTAGO ,HEALTH facilities ,MACHINE learning - Abstract
Predicting concentrations of pollen is of great importance both for patients and for public health institutions. In this paper, we present a forecasting approach which relies on data and makes no assumptions on the underlying phenomena affecting the plants and the pollination process. Machine learning is used to build a model and to select the most important variables for prediction. Through nonparametric hypothesis testing, we show how some variables are indeed more important than others and how the careful combination of these variables can lead to more accurate and parsimonious models which avoid the huge computational times of more complex models while outperforming them in terms of the precision of the forecasts. By increasing the richness of the selected variables based on the clustered Friedman importance ranks, prediction error is reduced from 4.57 to 4.40 grains/m
3 as an average, which accounts for a 3.5% average improvement across locations studied with a 50% reduction of execution times. [ABSTRACT FROM AUTHOR]- Published
- 2020
- Full Text
- View/download PDF