Back to Search Start Over

Isolation forests and landmarking-based representations for clustering algorithm recommendation using meta-learning.

Authors :
Gabbay, Itay
Shapira, Bracha
Rokach, Lior
Source :
Information Sciences. Oct2021, Vol. 574, p473-489. 17p.
Publication Year :
2021

Abstract

The data clustering problem can be described as the task of organizing data into groups, where in each group the objects share some similar attributes. Most of the problems clustering algorithms address do not have a prior solution. This paper addresses the algorithm selection challenge for data clustering, while taking the difficulty in evaluating clustering solutions into account. We present a new meta-learning method for recommending the most suitable clustering algorithm for a dataset. Based on concepts from the isolation forest algorithm, we propose a new similarity measure between datasets. Our proposed dataset characterization methods generate an embedding for a dataset using this similarity measure, which is then used to improve the quality of the problem's characterization. The method utilizes landmarking concepts to characterize the dataset and then, inspired by the DeepFM algorithm, applies meta-learning to rank the candidate algorithms that are expected to perform the best for the current dataset. This ranking could, among other things, support AutoML systems. Our approach is evaluated on a corpus of 100 publicly available benchmark datasets. We compare our method's ranking performance to that of existing meta-learning methods and show the dominance of our method in terms of predictive performance and computational complexity. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00200255
Volume :
574
Database :
Academic Search Index
Journal :
Information Sciences
Publication Type :
Periodical
Accession number :
152168747
Full Text :
https://doi.org/10.1016/j.ins.2021.06.033