Coresets for Clustering in Euclidean Spaces: Importance Sampling is Nearly Optimal
April 14, 2020 Β· Declared Dead Β· π Symposium on the Theory of Computing
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Lingxiao Huang, Nisheeth K. Vishnoi
arXiv ID
2004.06263
Category
cs.CG: Computational Geometry
Cross-listed
cs.DC,
cs.DS
Citations
83
Venue
Symposium on the Theory of Computing
Last Checked
1 month ago
Abstract
Given a collection of $n$ points in $\mathbb{R}^d$, the goal of the $(k,z)$-clustering problem is to find a subset of $k$ "centers" that minimizes the sum of the $z$-th powers of the Euclidean distance of each point to the closest center. Special cases of the $(k,z)$-clustering problem include the $k$-median and $k$-means problems. Our main result is a unified two-stage importance sampling framework that constructs an $\varepsilon$-coreset for the $(k,z)$-clustering problem. Compared to the results for $(k,z)$-clustering in [Feldman and Langberg, STOC 2011], our framework saves a $\varepsilon^2 d$ factor in the coreset size. Compared to the results for $(k,z)$-clustering in [Sohler and Woodruff, FOCS 2018], our framework saves a $\operatorname{poly}(k)$ factor in the coreset size and avoids the $\exp(k/\varepsilon)$ term in the construction time. Specifically, our coreset for $k$-median ($z=1$) has size $\tilde{O}(\varepsilon^{-4} k)$ which, when compared to the result in [Sohler and Woodruff, STOC 2018], saves a $k$ factor in the coreset size. Our algorithmic results rely on a new dimensionality reduction technique that connects two well-known shape fitting problems: subspace approximation and clustering, and may be of independent interest. We also provide a size lower bound of $Ξ©\left(k\cdot \min \left\{2^{z/20},d \right\}\right)$ for a $0.01$-coreset for $(k,z)$-clustering, which has a linear dependence of size on $k$ and an exponential dependence on $z$ that matches our algorithmic results.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Computational Geometry
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Dynamic Planar Convex Hull
R.I.P.
π»
Ghosted
TEMPO: Feature-Endowed TeichmΓΌller Extremal Mappings of Point Clouds
R.I.P.
π»
Ghosted
Explainable Artificial Intelligence for Manufacturing Cost Estimation and Machining Feature Visualization
R.I.P.
π»
Ghosted
Momen(e)t: Flavor the Moments in Learning to Classify Shapes
R.I.P.
π»
Ghosted
Dynamic Planar Voronoi Diagrams for General Distance Functions and their Algorithmic Applications
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Language Models are Few-Shot Learners
R.I.P.
π»
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
π»
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
π»
Ghosted