High probability generalization bounds for uniformly stable algorithms with nearly optimal rate
February 27, 2019 ยท Declared Dead ยท ๐ Annual Conference Computational Learning Theory
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Vitaly Feldman, Jan Vondrak
arXiv ID
1902.10710
Category
cs.LG: Machine Learning
Cross-listed
cs.DS,
stat.ML
Citations
170
Venue
Annual Conference Computational Learning Theory
Last Checked
1 month ago
Abstract
Algorithmic stability is a classical approach to understanding and analysis of the generalization error of learning algorithms. A notable weakness of most stability-based generalization bounds is that they hold only in expectation. Generalization with high probability has been established in a landmark paper of Bousquet and Elisseeff (2002) albeit at the expense of an additional $\sqrt{n}$ factor in the bound. Specifically, their bound on the estimation error of any $ฮณ$-uniformly stable learning algorithm on $n$ samples and range in $[0,1]$ is $O(ฮณ\sqrt{n \log(1/ฮด)} + \sqrt{\log(1/ฮด)/n})$ with probability $\geq 1-ฮด$. The $\sqrt{n}$ overhead makes the bound vacuous in the common settings where $ฮณ\geq 1/\sqrt{n}$. A stronger bound was recently proved by the authors (Feldman and Vondrak, 2018) that reduces the overhead to at most $O(n^{1/4})$. Still, both of these results give optimal generalization bounds only when $ฮณ= O(1/n)$. We prove a nearly tight bound of $O(ฮณ\log(n)\log(n/ฮด) + \sqrt{\log(1/ฮด)/n})$ on the estimation error of any $ฮณ$-uniformly stable algorithm. It implies that for algorithms that are uniformly stable with $ฮณ= O(1/\sqrt{n})$, estimation error is essentially the same as the sampling error. Our result leads to the first high-probability generalization bounds for multi-pass stochastic gradient descent and regularized ERM for stochastic convex problems with nearly optimal rate --- resolving open problems in prior work. Our proof technique is new and we introduce several analysis tools that might find additional applications.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Machine Learning
R.I.P.
๐ป
Ghosted
R.I.P.
๐ป
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
๐ป
Ghosted
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
R.I.P.
๐ป
Ghosted
Semi-Supervised Classification with Graph Convolutional Networks
R.I.P.
๐ป
Ghosted
Proximal Policy Optimization Algorithms
R.I.P.
๐ป
Ghosted
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
You Only Look Once: Unified, Real-Time Object Detection
R.I.P.
๐ป
Ghosted
A Unified Approach to Interpreting Model Predictions
R.I.P.
๐ป
Ghosted