Back to Search Start Over

Clustering Algorithms in an Educational Context: An Automatic Comparative Approach

Authors :
Danial Hooshyar
Yeongwook Yang
Margus Pedaste
Yueh-Min Huang
Source :
IEEE Access, Vol 8, Pp 146994-147014 (2020)
Publication Year :
2020
Publisher :
IEEE, 2020.

Abstract

Despite an increasing consensus regarding the significance of properly identifying the most suitable clustering method for a given problem, a surprising amount of educational research, including both educational data mining (EDM) and learning analytics (LA), neglects this critical task. This shortcoming could in many cases have a negative impact on the prediction power of both the EDM and LA based approaches. To address such issues, this work proposes an evaluation approach that automatically compares several clustering methods using multiple internal and external performance measures on 9 real-world educational datasets of different sizes, created from the University of Tartu's Moodle system, to produce two-way clustering. Moreover, to investigate the possible effect of normalization on the performance of the clustering algorithms, this work performs the same experiment on a normalized version of the datasets. Since such an exhaustive evaluation includes multiple criteria, the proposed approach employs a multiple criteria decision-making method (i.e., TOPSIS) to rank the most suitable methods for each dataset. Our results reveal that the proposed approach can automatically compare the performance of the clustering methods and accordingly recommend the most suitable method for each dataset. Furthermore, our results show that in both normalized and nonnormalized datasets of different sizes with 10 features, DBSCAN and k-medoids are the best clustering methods, whereas agglomerative and spectral methods appear to be among the most stable and highly performing clustering methods for such datasets with 15 features. Regarding datasets with more than 15 features, OPTICS is among the top-ranked algorithms among the nonnormalized datasets, and k-medoids is the best among the normalized datasets. Interestingly, our findings reveal that normalization may have a negative effect on the performance of certain methods, e.g., spectral clustering and OPTICS; however, it appears to mostly have a positive impact on all of the other clustering methods.

Details

Language :
English
ISSN :
21693536
Volume :
8
Database :
Directory of Open Access Journals
Journal :
IEEE Access
Publication Type :
Academic Journal
Accession number :
edsdoj.518aab87f7b04866b6b506af4db80f16
Document Type :
article
Full Text :
https://doi.org/10.1109/ACCESS.2020.3014948