1. A new imputation method based on genetic programming and weighted KNN for symbolic regression with incomplete data.
- Author
-
Al-Helali, Baligh, Chen, Qi, Xue, Bing, and Zhang, Mengjie
- Subjects
MISSING data (Statistics) ,GENETIC programming ,MACHINE learning ,DATA quality ,WEIGHTS & measures - Abstract
Incompleteness is one of the problematic data quality challenges in real-world machine learning tasks. A large number of studies have been conducted for addressing this challenge. However, most of the existing studies focus on the classification task and only a limited number of studies for symbolic regression with missing values exist. In this work, a new imputation method for symbolic regression with incomplete data is proposed. The method aims to improve both the effectiveness and efficiency of imputing missing values for symbolic regression. This method is based on genetic programming (GP) and weighted K-nearest neighbors (KNN). It constructs GP-based models using other available features to predict the missing values of incomplete features. The instances used for constructing such models are selected using weighted KNN. The experimental results on real-world data sets show that the proposed method outperforms a number of state-of-the-art methods with respect to the imputation accuracy, the symbolic regression performance, and the imputation time. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF