查看文章

aaai.org 中的 [PDF]

Approximate k-means++ in sublinear time

作者

Olivier Bachem, Mario Lucic, S Hamed Hassani, Andreas Krause

发表日期

2016/2/21

期刊

Proceedings of the AAAI conference on artificial intelligence

卷号

期号

简介

The quality of K-Means clustering is extremely sensitive to proper initialization. The classic remedy is to apply k-means++ to obtain an initial set of centers that is provably competitive with the optimal solution. Unfortunately, k-means++ requires k full passes over the data which limits its applicability to massive datasets. We address this problem by proposing a simple and efficient seeding algorithm for K-Means clustering. The main idea is to replace the exact D2-sampling step in k-means++ with a substantially faster approximation based on Markov Chain Monte Carlo sampling. We prove that, under natural assumptions on the data, the proposed algorithm retains the full theoretical guarantees of k-means++ while its computational complexity is only sublinear in the number of data points. For such datasets, one can thus obtain a provably good clustering in sublinear time. Extensive experiments confirm that the proposed method is competitive with k-means++ on a variety of real-world, large-scale datasets while offering a reduction in runtime of several orders of magnitude.

引用总数

被引用次数：174

20152016201720182019202020212022202320241 7 12 24 20 21 20 33 19 16

学术搜索中的文章

Approximate k-means++ in sublinear time

O Bachem, M Lucic, SH Hassani, A Krause - Proceedings of the AAAI conference on artificial …, 2016

被引用次数：170 相关文章所有 11 个版本

K-mc2: approximate k-means++ in sublinear time*

O Bachem, M Lucic, H Hassani, A Krause - AAAI 2016, 2016

被引用次数：6 相关文章