Back to Search Start Over

Fast Streaming $k$ k -Means Clustering With Coreset Caching.

Authors :
Zhang, Yu
Tangwongsan, Kanat
Tirthapura, Srikanta
Source :
IEEE Transactions on Knowledge & Data Engineering. Jun2022, Vol. 34 Issue 6, p2740-2754. 15p.
Publication Year :
2022

Abstract

We present new algorithms for $k$ k -means clustering on a data stream with a focus on providing fast responses to clustering queries. Compared to the state-of-the-art, our algorithms provide substantial improvements in the query time for cluster-center queries while retaining the desirable properties of provably small approximation error and low space usage. Our proposed clustering algorithms systematically reuse the “coresets” (summaries of data) computed for recent queries in answering the current clustering query, a novel technique which we refer to as coreset caching. We also present an algorithm called OnlineCC that integrates the coreset caching idea with a simple sequential streaming $k$ k -means algorithm. In practice, OnlineCC algorithm can provide constant query time. We present both theoretical analysis and detailed experiments demonstrating the correctness, accuracy, and efficiency of all our proposed clustering algorithms. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
10414347
Volume :
34
Issue :
6
Database :
Academic Search Index
Journal :
IEEE Transactions on Knowledge & Data Engineering
Publication Type :
Academic Journal
Accession number :
156653484
Full Text :
https://doi.org/10.1109/TKDE.2020.3018744