A Nearly Tight Analysis of Greedy k-means++
July 16, 2022 Β· Declared Dead Β· π ACM-SIAM Symposium on Discrete Algorithms
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Christoph Grunau, Ahmet Alper ΓzΓΌdoΔru, VΓ‘clav RozhoΕ, Jakub TΔtek
arXiv ID
2207.07949
Category
cs.DS: Data Structures & Algorithms
Cross-listed
cs.LG
Citations
15
Venue
ACM-SIAM Symposium on Discrete Algorithms
Last Checked
3 months ago
Abstract
The famous $k$-means++ algorithm of Arthur and Vassilvitskii [SODA 2007] is the most popular way of solving the $k$-means problem in practice. The algorithm is very simple: it samples the first center uniformly at random and each of the following $k-1$ centers is then always sampled proportional to its squared distance to the closest center so far. Afterward, Lloyd's iterative algorithm is run. The $k$-means++ algorithm is known to return a $Ξ(\log k)$ approximate solution in expectation. In their seminal work, Arthur and Vassilvitskii [SODA 2007] asked about the guarantees for its following \emph{greedy} variant: in every step, we sample $\ell$ candidate centers instead of one and then pick the one that minimizes the new cost. This is also how $k$-means++ is implemented in e.g. the popular Scikit-learn library [Pedregosa et al.; JMLR 2011]. We present nearly matching lower and upper bounds for the greedy $k$-means++: We prove that it is an $O(\ell^3 \log^3 k)$-approximation algorithm. On the other hand, we prove a lower bound of $Ξ©(\ell^3 \log^3 k / \log^2(\ell\log k))$. Previously, only an $Ξ©(\ell \log k)$ lower bound was known [Bhattacharya, Eube, RΓΆglin, Schmidt; ESA 2020] and there was no known upper bound.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Data Structures & Algorithms
π
π
The Cartographer
R.I.P.
π»
Ghosted
Route Planning in Transportation Networks
R.I.P.
π»
Ghosted
Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration
R.I.P.
π»
Ghosted
Hierarchical Clustering: Objective Functions and Algorithms
R.I.P.
π»
Ghosted
Graph Isomorphism in Quasipolynomial Time
π
π
The Cartographer
Simulation optimization: A review of algorithms and applications
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted