Optimal PAC Bounds Without Uniform Convergence

April 18, 2023 · Declared Dead · 🏛 IEEE Annual Symposium on Foundations of Computer Science

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Ishaq Aden-Ali, Yeshwanth Cherapanamjeri, Abhishek Shetty, Nikita Zhivotovskiy arXiv ID 2304.09167 Category cs.LG: Machine Learning Cross-listed cs.DS, math.ST Citations 20 Venue IEEE Annual Symposium on Foundations of Computer Science Last Checked 3 months ago

Abstract

In statistical learning theory, determining the sample complexity of realizable binary classification for VC classes was a long-standing open problem. The results of Simon and Hanneke established sharp upper bounds in this setting. However, the reliance of their argument on the uniform convergence principle limits its applicability to more general learning settings such as multiclass classification. In this paper, we address this issue by providing optimal high probability risk bounds through a framework that surpasses the limitations of uniform convergence arguments. Our framework converts the leave-one-out error of permutation invariant predictors into high probability risk bounds. As an application, by adapting the one-inclusion graph algorithm of Haussler, Littlestone, and Warmuth, we propose an algorithm that achieves an optimal PAC bound for binary classification. Specifically, our result shows that certain aggregations of one-inclusion graph algorithms are optimal, addressing a variant of a classic question posed by Warmuth. We further instantiate our framework in three settings where uniform convergence is provably suboptimal. For multiclass classification, we prove an optimal risk bound that scales with the one-inclusion hypergraph density of the class, addressing the suboptimality of the analysis of Daniely and Shalev-Shwartz. For partial hypothesis classification, we determine the optimal sample complexity bound, resolving a question posed by Alon, Hanneke, Holzman, and Moran. For realizable bounded regression with absolute loss, we derive an optimal risk bound that relies on a modified version of the scale-sensitive dimension, refining the results of Bartlett and Long. Our rates surpass standard uniform convergence-based results due to the smaller complexity measure in our risk bound.