A Nearly Optimal and Agnostic Algorithm for Properly Learning a Mixture of k Gaussians, for any Constant k
June 03, 2015 Β· Declared Dead Β· π arXiv.org
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Jerry Li, Ludwig Schmidt
arXiv ID
1506.01367
Category
cs.DS: Data Structures & Algorithms
Cross-listed
cs.IT,
cs.LG,
math.ST
Citations
12
Venue
arXiv.org
Last Checked
4 months ago
Abstract
Learning a Gaussian mixture model (GMM) is a fundamental problem in machine learning, learning theory, and statistics. One notion of learning a GMM is proper learning: here, the goal is to find a mixture of $k$ Gaussians $\mathcal{M}$ that is close to the density $f$ of the unknown distribution from which we draw samples. The distance between $\mathcal{M}$ and $f$ is typically measured in the total variation or $L_1$-norm. We give an algorithm for learning a mixture of $k$ univariate Gaussians that is nearly optimal for any fixed $k$. The sample complexity of our algorithm is $\tilde{O}(\frac{k}{Ξ΅^2})$ and the running time is $(k \cdot \log\frac{1}Ξ΅)^{O(k^4)} + \tilde{O}(\frac{k}{Ξ΅^2})$. It is well-known that this sample complexity is optimal (up to logarithmic factors), and it was already achieved by prior work. However, the best known time complexity for proper learning a $k$-GMM was $\tilde{O}(\frac{1}{Ξ΅^{3k-1}})$. In particular, the dependence between $\frac{1}Ξ΅$ and $k$ was exponential. We significantly improve this dependence by replacing the $\frac{1}Ξ΅$ term with a $\log \frac{1}Ξ΅$ while only increasing the exponent moderately. Hence, for any fixed $k$, the $\tilde{O} (\frac{k}{Ξ΅^2})$ term dominates our running time, and thus our algorithm runs in time which is nearly-linear in the number of samples drawn. Achieving a running time of $\textrm{poly}(k, \frac{1}Ξ΅)$ for proper learning of $k$-GMMs has recently been stated as an open problem by multiple researchers, and we make progress on this question. Moreover, our approach offers an agnostic learning guarantee: our algorithm returns a good GMM even if the distribution we are sampling from is not a mixture of Gaussians. To the best of our knowledge, our algorithm is the first agnostic proper learning algorithm for GMMs.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Data Structures & Algorithms
π
π
The Cartographer
R.I.P.
π»
Ghosted
Route Planning in Transportation Networks
R.I.P.
π»
Ghosted
Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration
R.I.P.
π»
Ghosted
Hierarchical Clustering: Objective Functions and Algorithms
R.I.P.
π»
Ghosted
Graph Isomorphism in Quasipolynomial Time
π
π
The Cartographer
Simulation optimization: A review of algorithms and applications
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted