R.I.P.
π»
Ghosted
Neural networks trained with SGD learn distributions of increasing complexity
November 21, 2022 Β· Entered Twilight Β· π International Conference on Machine Learning
Repo contents: .gitignore, README.md, cdatasets.py, censoring.py, dist_inc_comp.png, dist_inc_comp.py, models.py, test_censoring.py, utils.py, vit.py
Authors
Maria Refinetti, Alessandro Ingrosso, Sebastian Goldt
arXiv ID
2211.11567
Category
stat.ML: Machine Learning (Stat)
Cross-listed
cond-mat.dis-nn,
cond-mat.stat-mech,
cs.LG
Citations
54
Venue
International Conference on Machine Learning
Repository
https://github.com/sgoldt/dist_inc_comp
β 8
Last Checked
1 month ago
Abstract
The ability of deep neural networks to generalise well even when they interpolate their training data has been explained using various "simplicity biases". These theories postulate that neural networks avoid overfitting by first learning simple functions, say a linear classifier, before learning more complex, non-linear functions. Meanwhile, data structure is also recognised as a key ingredient for good generalisation, yet its role in simplicity biases is not yet understood. Here, we show that neural networks trained using stochastic gradient descent initially classify their inputs using lower-order input statistics, like mean and covariance, and exploit higher-order statistics only later during training. We first demonstrate this distributional simplicity bias (DSB) in a solvable model of a neural network trained on synthetic data. We empirically demonstrate DSB in a range of deep convolutional networks and visual transformers trained on CIFAR10, and show that it even holds in networks pre-trained on ImageNet. We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of Gaussian universality in learning.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Machine Learning (Stat)
R.I.P.
π»
Ghosted
Distilling the Knowledge in a Neural Network
R.I.P.
π»
Ghosted
Layer Normalization
R.I.P.
π»
Ghosted
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
R.I.P.
π»
Ghosted
Domain-Adversarial Training of Neural Networks
R.I.P.
π»
Ghosted