Back to Search Start Over

Less is more: Selecting the right benchmarking set of data for time series classification.

Authors :
Eftimov, Tome
Petelin, Gašper
Cenikj, Gjorgjina
Kostovska, Ana
Ispirova, Gordana
Korošec, Peter
Bogatinovski, Jasmin
Source :
Expert Systems with Applications. Jul2022, Vol. 198, pN.PAG-N.PAG. 1p.
Publication Year :
2022

Abstract

In this paper, we have proposed a new pipeline for landscape analysis of time-series machine learning datasets that enables us to better understand a benchmarking problem landscape, allows us to select a diverse benchmark datasets portfolio, and reduces the presence of performance assessment bias via bootstrapping evaluation. Combining a large multi-domain representation corpus of time-series specific features and the results of a large empirical study of time-series classification (TSC) benchmark, we showcase the capability of the pipeline to point out issues with non-redundancy and representativeness in the benchmark. By observing discrepancy between the empirical results of the bootstrap evaluation and recently adopted practices in TSC literature when introducing novel methods, we warn on the potentially harmful effects of tuning the methods on certain parts of the landscape (unless this is an explicit and desired goal of the study). Finally, we propose a set of datasets uniformly distributed across the landscape space one should consider when benchmarking novel TSC methods. • Complementary landscape analysis of time-series datasets. • Selecting unbiased benchmark datasets portfolio for comparison study. • Bootstrapping evaluation for reproducible statistical outcomes. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09574174
Volume :
198
Database :
Academic Search Index
Journal :
Expert Systems with Applications
Publication Type :
Academic Journal
Accession number :
156254382
Full Text :
https://doi.org/10.1016/j.eswa.2022.116871