The Structure of Optimal Private Tests for Simple Hypotheses

November 27, 2018 · Declared Dead · 🏛 Symposium on the Theory of Computing

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Clément L. Canonne, Gautam Kamath, Audra McMillan, Adam Smith, Jonathan Ullman arXiv ID 1811.11148 Category cs.DS: Data Structures & Algorithms Cross-listed cs.CR, cs.IT, cs.LG, stat.ML Citations 86 Venue Symposium on the Theory of Computing Last Checked 2 months ago

Abstract

Hypothesis testing plays a central role in statistical inference, and is used in many settings where privacy concerns are paramount. This work answers a basic question about privately testing simple hypotheses: given two distributions $P$ and $Q$, and a privacy level $\varepsilon$, how many i.i.d. samples are needed to distinguish $P$ from $Q$ subject to $\varepsilon$-differential privacy, and what sort of tests have optimal sample complexity? Specifically, we characterize this sample complexity up to constant factors in terms of the structure of $P$ and $Q$ and the privacy level $\varepsilon$, and show that this sample complexity is achieved by a certain randomized and clamped variant of the log-likelihood ratio test. Our result is an analogue of the classical Neyman-Pearson lemma in the setting of private hypothesis testing. We also give an application of our result to the private change-point detection. Our characterization applies more generally to hypothesis tests satisfying essentially any notion of algorithmic stability, which is known to imply strong generalization bounds in adaptive data analysis, and thus our results have applications even when privacy is not a primary concern.