Efficient Truncated Statistics with Unknown Truncation

August 02, 2019 · Declared Dead · 🏛 IEEE Annual Symposium on Foundations of Computer Science

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Vasilis Kontonis, Christos Tzamos, Manolis Zampetakis arXiv ID 1908.01034 Category math.ST Cross-listed cs.DS, cs.LG, stat.CO, stat.ML Citations 31 Venue IEEE Annual Symposium on Foundations of Computer Science Last Checked 3 months ago

Abstract

We study the problem of estimating the parameters of a Gaussian distribution when samples are only shown if they fall in some (unknown) subset $S \subseteq \R^d$. This core problem in truncated statistics has long history going back to Galton, Lee, Pearson and Fisher. Recent work by Daskalakis et al. (FOCS'18), provides the first efficient algorithm that works for arbitrary sets in high dimension when the set is known, but leaves as an open problem the more challenging and relevant case of unknown truncation set. Our main result is a computationally and sample efficient algorithm for estimating the parameters of the Gaussian under arbitrary unknown truncation sets whose performance decays with a natural measure of complexity of the set, namely its Gaussian surface area. Notably, this algorithm works for large families of sets including intersections of halfspaces, polynomial threshold functions and general convex sets. We show that our algorithm closely captures the tradeoff between the complexity of the set and the number of samples needed to learn the parameters by exhibiting a set with small Gaussian surface area for which it is information theoretically impossible to learn the true Gaussian with few samples.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — math.ST

R.I.P. 👻 Ghosted

Nonparametric regression using deep neural networks with ReLU activation function

Johannes Schmidt-Hieber

math.ST 🏛 Annals of Statistics 📚 949 cites 8 years ago

R.I.P. 👻 Ghosted

An introduction to Topological Data Analysis: fundamental and practical aspects for data scientists

Frédéric Chazal, Bertrand Michel

math.ST 🏛 AI 📚 727 cites 8 years ago

R.I.P. 👻 Ghosted

Minimax Optimal Procedures for Locally Private Estimation

John Duchi, Martin Wainwright, Michael Jordan

math.ST 🏛 arXiv 📚 481 cites 10 years ago

R.I.P. 👻 Ghosted

Optimal Best Arm Identification with Fixed Confidence

Aurélien Garivier, Emilie Kaufmann

math.ST 🏛 COLT 📚 384 cites 10 years ago

R.I.P. 👻 Ghosted

Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees

Yudong Chen, Martin J. Wainwright

math.ST 🏛 arXiv 📚 329 cites 10 years ago

R.I.P. 👻 Ghosted

User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient

Arnak S. Dalalyan, Avetik G. Karagulyan

math.ST 🏛 Stochastic Processes and their Applications 📚 319 cites 8 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Federated Learning: Strategies for Improving Communication Efficiency

Jakub Konečný, H. Brendan McMahan, ... (+4 more)

cs.LG 🏛 arXiv 📚 5.2K cites 9 years ago

R.I.P. 👻 Ghosted

In-Datacenter Performance Analysis of a Tensor Processing Unit

Norman P. Jouppi, Cliff Young, ... (+73 more)

cs.AR 🏛 ISCA 📚 5.1K cites 9 years ago

R.I.P. 👻 Ghosted

Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning

Hoo-Chang Shin, Holger R. Roth, ... (+7 more)

cs.CV 🏛 IEEE TMI 📚 4.9K cites 10 years ago

R.I.P. 👻 Ghosted

Explanation in Artificial Intelligence: Insights from the Social Sciences

Tim Miller

cs.AI 🏛 AI 📚 4.9K cites 8 years ago