Global Optimality in Tensor Factorization, Deep Learning, and Beyond

June 24, 2015 · Declared Dead · 🏛 arXiv.org

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Benjamin D. Haeffele, Rene Vidal arXiv ID 1506.07540 Category math.NA: Numerical Analysis Cross-listed cs.LG, stat.ML Citations 151 Venue arXiv.org Last Checked 1 month ago

Abstract

Techniques involving factorization are found in a wide range of applications and have enjoyed significant empirical success in many fields. However, common to a vast majority of these problems is the significant disadvantage that the associated optimization problems are typically non-convex due to a multilinear form or other convexity destroying transformation. Here we build on ideas from convex relaxations of matrix factorizations and present a very general framework which allows for the analysis of a wide range of non-convex factorization problems - including matrix factorization, tensor factorization, and deep neural network training formulations. We derive sufficient conditions to guarantee that a local minimum of the non-convex optimization problem is a global minimum and show that if the size of the factorized variables is large enough then from any initialization it is possible to find a global minimizer using a purely local descent algorithm. Our framework also provides a partial theoretical justification for the increasingly common use of Rectified Linear Units (ReLUs) in deep neural networks and offers guidance on deep network architectures and regularization strategies to facilitate efficient optimization.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Numerical Analysis

R.I.P. 👻 Ghosted

Solving high-dimensional partial differential equations using deep learning

Jiequn Han, Arnulf Jentzen, Weinan E

math.NA 🏛 PNAS 📚 1.9K cites 8 years ago

R.I.P. 👻 Ghosted

Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations

Weinan E, Jiequn Han, Arnulf Jentzen

math.NA 🏛 Communications in Mathematics and Statistics 📚 872 cites 8 years ago

R.I.P. 👻 Ghosted

PDE-Net: Learning PDEs from Data

Zichao Long, Yiping Lu, ... (+2 more)

math.NA 🏛 ICML 📚 832 cites 8 years ago

R.I.P. 👻 Ghosted

Efficient tensor completion for color image and video recovery: Low-rank tensor train

Johann A. Bengua, Ho N. Phien, ... (+2 more)

math.NA 🏛 IEEE TIP 📚 436 cites 9 years ago

R.I.P. 👻 Ghosted

Tensor Ring Decomposition

Qibin Zhao, Guoxu Zhou, ... (+3 more)

math.NA 🏛 arXiv 📚 427 cites 9 years ago

R.I.P. 👻 Ghosted

Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations

Christian Beck, Weinan E, Arnulf Jentzen

math.NA 🏛 Journal of nonlinear science 📚 353 cites 8 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Language Models are Few-Shot Learners

Tom B. Brown, Benjamin Mann, ... (+29 more)

cs.CL 🏛 NeurIPS 📚 54.2K cites 5 years ago

R.I.P. 👻 Ghosted

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke, Sam Gross, ... (+19 more)

cs.LG 🏛 NeurIPS 📚 49.7K cites 6 years ago

R.I.P. 👻 Ghosted

XGBoost: A Scalable Tree Boosting System

Tianqi Chen, Carlos Guestrin

cs.LG 🏛 KDD 📚 49.2K cites 10 years ago

R.I.P. 👻 Ghosted

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, Christian Szegedy

cs.LG 🏛 ICML 📚 46.0K cites 11 years ago