On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization

August 16, 2018 ยท Declared Dead ยท ๐Ÿ› Trans. Mach. Learn. Res.

๐Ÿ‘ป CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Dongruo Zhou, Jinghui Chen, Yuan Cao, Ziyan Yang, Quanquan Gu arXiv ID 1808.05671 Category cs.LG: Machine Learning Cross-listed math.OC, stat.ML Citations 164 Venue Trans. Mach. Learn. Res. Last Checked 4 months ago
Abstract
Adaptive gradient methods are workhorses in deep learning. However, the convergence guarantees of adaptive gradient methods for nonconvex optimization have not been thoroughly studied. In this paper, we provide a fine-grained convergence analysis for a general class of adaptive gradient methods including AMSGrad, RMSProp and AdaGrad. For smooth nonconvex functions, we prove that adaptive gradient methods in expectation converge to a first-order stationary point. Our convergence rate is better than existing results for adaptive gradient methods in terms of dimension. In addition, we also prove high probability bounds on the convergence rates of AMSGrad, RMSProp as well as AdaGrad, which have not been established before. Our analyses shed light on better understanding the mechanism behind adaptive gradient methods in optimizing nonconvex objectives.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Machine Learning

Died the same way โ€” ๐Ÿ‘ป Ghosted