Back to Search Start Over

Median-KNN Regressor-SMOTE-Tomek Links for Handling Missing and Imbalanced Data in Air Quality Prediction.

Authors :
Chandra, Winoto
Suprihatin, Bambang
Resti, Yulia
Source :
Symmetry (20738994); Apr2023, Vol. 15 Issue 4, p887, 16p
Publication Year :
2023

Abstract

The Air Quality Index (AQI) dataset contains information on measurements of pollutants and ambient air quality conditions at certain location that can be used to predict air quality. Unfortunately, this dataset often has many missing observations and imbalanced classes. Both of these problems can affect the performance of the prediction model. In particular, predictions for the minority class are very important because inaccurate predictions can be fatal or cause big losses. Moreover, the missing data may lead to biased results. This paper proposes the single imputation of the median and the multiple imputations of the k -Nearest Neighbor (KNN) regressor to handle missing values of less than or equal to 10% and more than 10%, respectively. At the same time, the SMOTE-Tomek Links address the imbalanced class. These proposed approaches to handle both issues are then used to assess the air quality prediction of the India AQI dataset using Naive Bayes (NB), KNN, and C4.5. The five treatments show that the proposed method of the Median-KNN regressor-SMOTE-Tomek Links is able to improve the performance of the India air quality prediction model. In other words, the proposed method succeeds in overcoming the problems of missing values and class imbalance. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
20738994
Volume :
15
Issue :
4
Database :
Complementary Index
Journal :
Symmetry (20738994)
Publication Type :
Academic Journal
Accession number :
163458191
Full Text :
https://doi.org/10.3390/sym15040887