Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks

September 23, 2018 ยท Declared Dead ยท ๐Ÿ› Annual Conference Computational Learning Theory

๐Ÿ‘ป CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Ohad Shamir arXiv ID 1809.08587 Category cs.LG: Machine Learning Cross-listed cs.NE, math.OC, stat.ML Citations 49 Venue Annual Conference Computational Learning Theory Last Checked 3 months ago
Abstract
We study the dynamics of gradient descent on objective functions of the form $f(\prod_{i=1}^{k} w_i)$ (with respect to scalar parameters $w_1,\ldots,w_k$), which arise in the context of training depth-$k$ linear neural networks. We prove that for standard random initializations, and under mild assumptions on $f$, the number of iterations required for convergence scales exponentially with the depth $k$. We also show empirically that this phenomenon can occur in higher dimensions, where each $w_i$ is a matrix. This highlights a potential obstacle in understanding the convergence of gradient-based methods for deep linear neural networks, where $k$ is large.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Machine Learning

Died the same way โ€” ๐Ÿ‘ป Ghosted