Back to Search Start Over

Comparison of multiclass classification techniques using dry bean dataset

Authors :
Md Salauddin Khan
Tushar Deb Nath
Md Murad Hossain
Arnab Mukherjee
Hafiz Bin Hasnath
Tahera Manhaz Meem
Umama Khan
Source :
International Journal of Cognitive Computing in Engineering, Vol 4, Iss , Pp 6-20 (2023)
Publication Year :
2023
Publisher :
KeAi Communications Co., Ltd., 2023.

Abstract

Background: The application of classsification methods through multivariate and machine learning techniques has enormous significance in agricultural sector. It is vital to classify various types of seeds as well as identify the quality of seeds which has a great impact on the production of crops. There is a wide range of genetic variations in dry beans all over the world. Many studies have been conducted previously on various dataset to indentify the sorts of dry beans, however most of them focused on machine learning techniques with binary classification. Objective: The aim of this study is to investigate a reliable classifier which has the lowest noise implications and establish an algorithm for dry bean classification effectively. This paper focuses on outlier removals, oversampling with Adaptive Synthetic (ADASYN) algorithm and finding the best classifier to guarantee the highest possible accuracy. Methods: The raw dataset for this study was accessed from UCI Machine Learning Repository. The dataset contained grains having 16 features, 12 dimensions, and 4 distinct shapes. For the purpose of eliminating missing values from the dataset, interquartile range (IQR) with python programming was utilized. Eight most popular classifiers were used in this study which are Logistic Regression (LR), Naïve Bayes (NB), k-Nearest Neighbor (KNN), Decision Tree (DT), Random Forest (RF), Extreme Gradient Boosting (XGB), Support Vector Machine (SVM), and Multilayer Perception (MLP) with balanced and imbalanced classes. The authors utilized frequency tables, bar diagrams, boxplots, analysis of variance for descriptive analysis as well as data preprocessing. Results: The XGB classifier preferably outperformed than other classifiers with balanced and imbalanced distribution of dry beans within each class. It has acquired accuracy (ACC) 93.0% and 95.4% in imbalanced and balanced classes respectively. In case of balanced dataset, after application of ADASYN algorithm both KNN and RF techniques also performed well regarding the Classification Accuracy (ACC), Sensitivity (SE), Specificity (SP) and Cohen's kappa coefficient (Kappa) etc. The most important attributes for classifying the dry beans were found ShapeFactor2, Minor Axis Length, and ShapeFactor1 along with EquivDiameter, Roundness and ConvexArea. Conclusions: For classification of dry seeds, the XGB classifier had performed well when the dataset contained both balanced and imbalanced distribution in classes. Also, it is the primary approach of identifying the classes of seeds/beans with balanced or not. If the classes of the target variable are balanced well, then the KNN and RF algorithms may be applied along with XGB technique for more accurate classification.

Details

Language :
English
ISSN :
26663074
Volume :
4
Issue :
6-20
Database :
Directory of Open Access Journals
Journal :
International Journal of Cognitive Computing in Engineering
Publication Type :
Academic Journal
Accession number :
edsdoj.bb2f1543a03a4a1e82b4e8374114b1b3
Document Type :
article
Full Text :
https://doi.org/10.1016/j.ijcce.2023.01.002