Publication: Fast Streaming k-Means Clustering with Coreset Caching
Issued Date
2020-01-01
Resource Type
ISSN
15582191
10414347
10414347
Other identifier(s)
2-s2.0-85090466944
Rights
Mahidol University
Rights Holder(s)
SCOPUS
Bibliographic Citation
IEEE Transactions on Knowledge and Data Engineering. (2020)
Suggested Citation
Yu Zhang, Kanat Tangwongsan, Srikanta Tirthapura Fast Streaming k-Means Clustering with Coreset Caching. IEEE Transactions on Knowledge and Data Engineering. (2020). doi:10.1109/TKDE.2020.3018744 Retrieved from: https://repository.li.mahidol.ac.th/handle/20.500.14594/59046
Research Projects
Organizational Units
Authors
Journal Issue
Thesis
Title
Fast Streaming k-Means Clustering with Coreset Caching
Author(s)
Other Contributor(s)
Abstract
IEEE We present new algorithms for k-means clustering on a data stream with a focus on providing fast responses to clustering queries. Compared to the state-of-the-art, our algorithms provide substantial improvements in the query time for cluster centers while retaining the desirable properties of provably small approximation error and low space usage. Our proposed clustering algorithms systematically reuse the "coresets" (summaries of data) computed for recent queries in answering the current clustering query, a novel technique which we refer to as coreset caching. We also present an algorithm called OnlineCC that integrates the coreset caching idea with a simple sequential streaming k-means algorithm. In practice, OnlineCC algorithm can provide constant query time. We present both theoretical analysis and detailed experiments demonstrating the correctness, accuracy, and efficiency of all our proposed clustering algorithms.