New Convergence Aspects of Stochastic Gradient Algorithms
November 10, 2018 Β· Declared Dead Β· π Journal of machine learning research
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Lam M. Nguyen, Phuong Ha Nguyen, Peter RichtΓ‘rik, Katya Scheinberg, Martin TakΓ‘Δ, Marten van Dijk
arXiv ID
1811.12403
Category
math.OC: Optimization & Control
Cross-listed
cs.LG
Citations
72
Venue
Journal of machine learning research
Last Checked
4 months ago
Abstract
The classical convergence analysis of SGD is carried out under the assumption that the norm of the stochastic gradient is uniformly bounded. While this might hold for some loss functions, it is violated for cases where the objective function is strongly convex. In Bottou et al. (2018), a new analysis of convergence of SGD is performed under the assumption that stochastic gradients are bounded with respect to the true gradient norm. We show that for stochastic problems arising in machine learning such bound always holds; and we also propose an alternative convergence analysis of SGD with diminishing learning rate regime. We then move on to the asynchronous parallel setting, and prove convergence of Hogwild! algorithm in the same regime in the case of diminished learning rate. It is well-known that SGD converges if a sequence of learning rates $\{Ξ·_t\}$ satisfies $\sum_{t=0}^\infty Ξ·_t \rightarrow \infty$ and $\sum_{t=0}^\infty Ξ·^2_t < \infty$. We show the convergence of SGD for strongly convex objective function without using bounded gradient assumption when $\{Ξ·_t\}$ is a diminishing sequence and $\sum_{t=0}^\infty Ξ·_t \rightarrow \infty$. In other words, we extend the current state-of-the-art class of learning rates satisfying the convergence of SGD.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Optimization & Control
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Local SGD Converges Fast and Communicates Little
R.I.P.
π»
Ghosted
On Lazy Training in Differentiable Programming
π
π
The Cartographer
A Review on Bilevel Optimization: From Classical to Evolutionary Approaches and Applications
R.I.P.
π»
Ghosted
Learned Primal-dual Reconstruction
R.I.P.
π»
Ghosted
On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted