StrassenNets: Deep Learning with a Multiplication Budget
December 11, 2017 ยท Entered Twilight ยท ๐ International Conference on Machine Learning
"Last commit was 7.0 years ago (โฅ5 year threshold)"
Evidence collected by the PWNC Scanner
Repo contents: PosterStrassenNets.pdf, README.md, StrassenNetsCIFAR10.ipynb, StrassenNetworksCIFAR10_VGG.py, common, langmod
Authors
Michael Tschannen, Aran Khanna, Anima Anandkumar
arXiv ID
1712.03942
Category
cs.LG: Machine Learning
Cross-listed
cs.CV
Citations
30
Venue
International Conference on Machine Learning
Repository
https://github.com/mitscha/strassennets
โญ 47
Last Checked
1 month ago
Abstract
A large fraction of the arithmetic operations required to evaluate deep neural networks (DNNs) consists of matrix multiplications, in both convolution and fully connected layers. We perform end-to-end learning of low-cost approximations of matrix multiplications in DNN layers by casting matrix multiplications as 2-layer sum-product networks (SPNs) (arithmetic circuits) and learning their (ternary) edge weights from data. The SPNs disentangle multiplication and addition operations and enable us to impose a budget on the number of multiplication operations. Combining our method with knowledge distillation and applying it to image classification DNNs (trained on ImageNet) and language modeling DNNs (using LSTMs), we obtain a first-of-a-kind reduction in number of multiplications (over 99.5%) while maintaining the predictive performance of the full-precision models. Finally, we demonstrate that the proposed framework is able to rediscover Strassen's matrix multiplication algorithm, learning to multiply $2 \times 2$ matrices using only 7 multiplications instead of 8.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Machine Learning
R.I.P.
๐ป
Ghosted
R.I.P.
๐ป
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
๐ป
Ghosted
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
R.I.P.
๐ป
Ghosted
Semi-Supervised Classification with Graph Convolutional Networks
R.I.P.
๐ป
Ghosted
Proximal Policy Optimization Algorithms
R.I.P.
๐ป
Ghosted