Start Over

Streaming Data Analysis: Clustering or Classification?

Authors :: Bezdek, James C.
Keller, James M.
Source :: IEEE Transactions on Systems, Man & Cybernetics. Systems. Jan2021, Vol. 51 Issue 1, p91-102. 12p.
Publication Year :: 2021
Abstract: This article is a position paper about models and algorithms that are generally called “stream clustering.” Semantics and methods used in this field are often co-opted from static clustering, but they do not serve well for streaming data analysis. Most “state-of-the-art” methods, such as sequential k-means, Birch, CluStream, DenStream, etc., acknowledge that the data are seen but once in real streaming analysis (e.g., intrusion detection, voter fraud, etc.). Interpretation of their outputs generally overlooks the fact that when the data cannot be saved, batch clustering ideas, such as preclustering assessment, partitioning, and cluster validity are not relevant. But in the current literature, the data, or some subset of it, are often saved for hindsight evaluation (we call this fake stream clustering). Our position? Useful analysis of real streaming data is in its infancy. We do not argue that current approaches to streaming clustering are wrong: rather, we regard them as transitional methods which will eventually lead to a new and useful paradigm for this type of computation. We think that this class of models and algorithms are actually classifiers, but with a special added component, viz., continuously updated cluster footprints of the instream processing. We need to carefully define the objectives of streaming analysis, and then choose terminology and methods that suit this evolving paradigm. [ABSTRACT FROM AUTHOR]