On the Quality of the Initial Basin in Overspecified Neural Networks

November 13, 2015 · Declared Dead · 🏛 International Conference on Machine Learning

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Itay Safran, Ohad Shamir arXiv ID 1511.04210 Category cs.LG: Machine Learning Cross-listed stat.ML Citations 129 Venue International Conference on Machine Learning Last Checked 3 months ago

Abstract

Deep learning, in the form of artificial neural networks, has achieved remarkable practical success in recent years, for a variety of difficult machine learning applications. However, a theoretical explanation for this remains a major open problem, since training neural networks involves optimizing a highly non-convex objective function, and is known to be computationally hard in the worst case. In this work, we study the \emph{geometric} structure of the associated non-convex objective function, in the context of ReLU networks and starting from a random initialization of the network parameters. We identify some conditions under which it becomes more favorable to optimization, in the sense of (i) High probability of initializing at a point from which there is a monotonically decreasing path to a global minimum; and (ii) High probability of initializing at a basin (suitably defined) with a small minimal objective value. A common theme in our results is that such properties are more likely to hold for larger ("overspecified") networks, which accords with some recent empirical and theoretical observations.