Back to Search Start Over

OCtS: an alternative of the t-Score method sensitive to outliers and correlation in feature selection.

Authors :
Demirarslan, Mert
Suner, Aslı
Source :
Communications in Statistics: Simulation & Computation. 2024, Vol. 53 Issue 3, p1409-1422. 14p.
Publication Year :
2024

Abstract

A wide range of issues including missing values, class noise, class imbalance, outliers, correlation and irrelevant variables have the potential to negatively affect the overall performance of disease diagnosis classification algorithms. This study proposes a new technique, alternative to the t-Score method, to increase the performance of ensemble learning classification algorithms by removing irrelevant variables. Therefore, three publicly available datasets from medical domain varying in their sample sizes, number of variables, and data preprocessing problems were selected and processed with our newly proposed feature selection method called Outliers and Correlation t-Score (OCtS). Afterwards, six widely used ensemble learning algorithms including Random Forest, Gradient Boosting Machine, Extreme Gradient Boosting Machine, Light Gradient Boosting Machine, CatBoost, and Bagging were employed for disease diagnosis classification, and performance metrics were measured. Our results indicate that the classification performance of six ensemble learning algorithms significantly increased when the OCtS method was employed, and our feature selection method, OCtS, exhibited higher performance compared to the standard t-score method across all datasets (p = 0.0001). We conclude that, using data preprocessing methods with OCtS offers better algorithm performance when employing ensemble learning algorithms in disease diagnosis classification. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
03610918
Volume :
53
Issue :
3
Database :
Academic Search Index
Journal :
Communications in Statistics: Simulation & Computation
Publication Type :
Academic Journal
Accession number :
175722455
Full Text :
https://doi.org/10.1080/03610918.2022.2046087