Gradient Descent Maximizes the Margin of Homogeneous Neural Networks
June 13, 2019 ยท Declared Dead ยท ๐ International Conference on Learning Representations
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Kaifeng Lyu, Jian Li
arXiv ID
1906.05890
Category
cs.LG: Machine Learning
Cross-listed
cs.NE,
stat.ML
Citations
380
Venue
International Conference on Learning Representations
Last Checked
3 months ago
Abstract
In this paper, we study the implicit regularization of the gradient descent algorithm in homogeneous neural networks, including fully-connected and convolutional neural networks with ReLU or LeakyReLU activations. In particular, we study the gradient descent or gradient flow (i.e., gradient descent with infinitesimal step size) optimizing the logistic loss or cross-entropy loss of any homogeneous model (possibly non-smooth), and show that if the training loss decreases below a certain threshold, then we can define a smoothed version of the normalized margin which increases over time. We also formulate a natural constrained optimization problem related to margin maximization, and prove that both the normalized margin and its smoothed version converge to the objective value at a KKT point of the optimization problem. Our results generalize the previous results for logistic regression with one-layer or multi-layer linear networks, and provide more quantitative convergence results with weaker assumptions than previous results for homogeneous smooth neural networks. We conduct several experiments to justify our theoretical finding on MNIST and CIFAR-10 datasets. Finally, as margin is closely related to robustness, we discuss potential benefits of training longer for improving the robustness of the model.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Machine Learning
R.I.P.
๐ป
Ghosted
R.I.P.
๐ป
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
๐ป
Ghosted
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
R.I.P.
๐ป
Ghosted
Semi-Supervised Classification with Graph Convolutional Networks
R.I.P.
๐ป
Ghosted
Proximal Policy Optimization Algorithms
R.I.P.
๐ป
Ghosted
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
You Only Look Once: Unified, Real-Time Object Detection
R.I.P.
๐ป
Ghosted
A Unified Approach to Interpreting Model Predictions
R.I.P.
๐ป
Ghosted