Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions

May 02, 2016 · Declared Dead · 🏛 Information Technology Convergence and Services

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Ioannis Panageas, Georgios Piliouras arXiv ID 1605.00405 Category math.DS Cross-listed cs.LG Citations 154 Venue Information Technology Convergence and Services Last Checked 1 month ago

Abstract

Given a non-convex twice differentiable cost function f, we prove that the set of initial conditions so that gradient descent converges to saddle points where \nabla^2 f has at least one strictly negative eigenvalue has (Lebesgue) measure zero, even for cost functions f with non-isolated critical points, answering an open question in [Lee, Simchowitz, Jordan, Recht, COLT2016]. Moreover, this result extends to forward-invariant convex subspaces, allowing for weak (non-globally Lipschitz) smoothness assumptions. Finally, we produce an upper bound on the allowable step-size.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — math.DS

R.I.P. 👻 Ghosted

Deep learning for universal linear embeddings of nonlinear dynamics

Bethany Lusch, J. Nathan Kutz, Steven L. Brunton

math.DS 🏛 Nat. Commun. 📚 1.5K cites 8 years ago

R.I.P. 👻 Ghosted

Linearly-Recurrent Autoencoder Networks for Learning Dynamics

Samuel E. Otto, Clarence W. Rowley

math.DS 🏛 SIAM Journal on Applied Dynamical Systems 📚 374 cites 8 years ago

R.I.P. 👻 Ghosted

Eigendecompositions of Transfer Operators in Reproducing Kernel Hilbert Spaces

Stefan Klus, Ingmar Schuster, Krikamol Muandet

math.DS 🏛 Journal of nonlinear science 📚 136 cites 8 years ago

R.I.P. 👻 Ghosted

From rate distortion theory to metric mean dimension: variational principle

Elon Lindenstrauss, Masaki Tsukamoto

math.DS 🏛 IEEE TIT 📚 92 cites 9 years ago

R.I.P. 👻 Ghosted

Double variational principle for mean dimension

Elon Lindenstrauss, Masaki Tsukamoto

math.DS 🏛 Geometric and Functional Analysis 📚 68 cites 7 years ago

R.I.P. 👻 Ghosted

Discovering conservation laws from data for control

Eurika Kaiser, J. Nathan Kutz, Steven L. Brunton

math.DS 🏛 CDC 📚 54 cites 7 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Language Models are Few-Shot Learners

Tom B. Brown, Benjamin Mann, ... (+29 more)

cs.CL 🏛 NeurIPS 📚 54.2K cites 5 years ago

R.I.P. 👻 Ghosted

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke, Sam Gross, ... (+19 more)

cs.LG 🏛 NeurIPS 📚 49.7K cites 6 years ago

R.I.P. 👻 Ghosted

XGBoost: A Scalable Tree Boosting System

Tianqi Chen, Carlos Guestrin

cs.LG 🏛 KDD 📚 49.2K cites 10 years ago

R.I.P. 👻 Ghosted

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, Christian Szegedy

cs.LG 🏛 ICML 📚 46.0K cites 11 years ago