Back to Search
Start Over
Big data analytics approaches for treatment of imbalance and missing values problems on high dimensionality dataset.
- Source :
-
AIP Conference Proceedings . 2024, Vol. 3150 Issue 1, p1-18. 18p. - Publication Year :
- 2024
-
Abstract
- The telecommunications industry faced challenges with their datasets, primarily due to their high dimensionality and other issues such as imbalanced classes and missing values. These deficiencies led to inaccurate predictions and a decline in performance when the datasets were not handled properly. Due to the significant disparity in size between the churned customer class and the active customer class, the accuracy paradox arose. Consequently, despite the model's accuracy metrics reaching 90%, this level of performance aligned with the actual distribution of classes. In addition, the presence of numerous features significantly prolonged the time required for learning and computation. This was due to the inclusion of redundant and unnecessary features, which created disarray and hindered the learning process. Therefore, the purpose of this study was to determine the effect of feature selection, imputation data, and techniques for dealing with imbalanced data on model performance. This study proposed the improvement of the techniques for developing voluntary churn models by combining techniques for dealing with imbalance and missing data with high dimensionality. Thus, when compared to other combinations of models, the combination of Decision Trees+Mode Imputation+SMOTE with Random Undersampling methods and Random Forest as the classifier builder produced the highest classification accuracy, AUC, and F1-Score. Additionally, this study suggested the use of Dask or PySpark for processing the large telecommunication dataset to allow for the faster and more effective execution of other machine learning algorithms in Python via parallel computing. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 0094243X
- Volume :
- 3150
- Issue :
- 1
- Database :
- Academic Search Index
- Journal :
- AIP Conference Proceedings
- Publication Type :
- Conference
- Accession number :
- 179640277
- Full Text :
- https://doi.org/10.1063/5.0228054