Back to Search Start Over

Analysis of Preprocessing Techniques for Missing Data in the Prediction of Sunflower Yield in Response to the Effects of Climate Change.

Authors :
Călin, Alina Delia
Coroiu, Adriana Mihaela
Mureşan, Horea Bogdan
Source :
Applied Sciences (2076-3417); Jul2023, Vol. 13 Issue 13, p7415, 21p
Publication Year :
2023

Abstract

Featured Application: The application of this research in agriculture is related to the rapid progression of climate change, which has drastic effects on crops. We can build an accurate and robust model that can help identify the optimum planting day to maximise the crop yield, based on the weather forecast and meteorological conditions of the region. This can support farmers in their agricultural activity planning and help minimise the impact of weather anomalies. Machine learning is often used to predict crop yield based on the sowing date and weather parameters in non-irrigated crops. In the context of climate change, regression algorithms can help identify correlations and plan agricultural activities to maximise production. In the case of sunflower crops, we identified datasets that are not very large and have many missing values, generating a low-performance regression model. In this paper, our aim is to study and compare several approaches for missing-value imputation in order to improve our regression model. In our experiments, we compare nine imputation methods, using mean values, similar values, interpolation (linear, spline, pad), and prediction (linear regression, random forest, extreme gradient boosting regressor, and histogram gradient boosting regression). We also employ four unsupervised outlier removal algorithms and their influence on the regression model: isolation forest, minimum covariance determinant, local outlier factor and OneClass-SVM. After preprocessing, the obtained datasets are used to build regression models using the extreme gradient boosting regressor and histogram gradient boosting regression, and their performance is compared. The evaluation of the models shows an increased R 2 from 0.723 when removing instances with missing data, to 0.938 for imputation using Random Forest prediction and OneClass-SVM-based outlier removal. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
20763417
Volume :
13
Issue :
13
Database :
Complementary Index
Journal :
Applied Sciences (2076-3417)
Publication Type :
Academic Journal
Accession number :
164921139
Full Text :
https://doi.org/10.3390/app13137415