Entropy-Constrained Training of Deep Neural Networks

December 18, 2018 · Declared Dead · 🏛 IEEE International Joint Conference on Neural Network

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Simon Wiedemann, Arturo Marban, Klaus-Robert Müller, Wojciech Samek arXiv ID 1812.07520 Category cs.LG: Machine Learning Cross-listed cs.NE, stat.ML Citations 30 Venue IEEE International Joint Conference on Neural Network Last Checked 3 months ago

Abstract

We propose a general framework for neural network compression that is motivated by the Minimum Description Length (MDL) principle. For that we first derive an expression for the entropy of a neural network, which measures its complexity explicitly in terms of its bit-size. Then, we formalize the problem of neural network compression as an entropy-constrained optimization objective. This objective generalizes many of the compression techniques proposed in the literature, in that pruning or reducing the cardinality of the weight elements of the network can be seen special cases of entropy-minimization techniques. Furthermore, we derive a continuous relaxation of the objective, which allows us to minimize it using gradient based optimization techniques. Finally, we show that we can reach state-of-the-art compression results on different network architectures and data sets, e.g. achieving x71 compression gains on a VGG-like architecture.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Machine Learning

R.I.P. 👻 Ghosted

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke, Sam Gross, ... (+19 more)

cs.LG 🏛 NeurIPS 📚 49.7K cites 6 years ago

R.I.P. 👻 Ghosted

XGBoost: A Scalable Tree Boosting System

Tianqi Chen, Carlos Guestrin

cs.LG 🏛 KDD 📚 49.2K cites 10 years ago

R.I.P. 👻 Ghosted

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, Christian Szegedy

cs.LG 🏛 ICML 📚 46.0K cites 11 years ago

R.I.P. 👻 Ghosted

Semi-Supervised Classification with Graph Convolutional Networks

Thomas N. Kipf, Max Welling

cs.LG 🏛 ICLR 📚 33.5K cites 9 years ago

R.I.P. 👻 Ghosted

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, ... (+3 more)

cs.LG 🏛 arXiv 📚 25.1K cites 8 years ago

R.I.P. 👻 Ghosted

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, Noam Shazeer, ... (+7 more)

cs.LG 🏛 JMLR 📚 24.4K cites 6 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Language Models are Few-Shot Learners

Tom B. Brown, Benjamin Mann, ... (+29 more)

cs.CL 🏛 NeurIPS 📚 54.2K cites 6 years ago

R.I.P. 👻 Ghosted

You Only Look Once: Unified, Real-Time Object Detection

Joseph Redmon, Santosh Divvala, ... (+2 more)

cs.CV 🏛 CVPR 📚 43.4K cites 10 years ago

R.I.P. 👻 Ghosted

A Unified Approach to Interpreting Model Predictions

Scott Lundberg, Su-In Lee

cs.AI 🏛 NeurIPS 📚 30.8K cites 9 years ago

R.I.P. 👻 Ghosted

Rethinking the Inception Architecture for Computer Vision

Christian Szegedy, Vincent Vanhoucke, ... (+3 more)

cs.CV 🏛 CVPR 📚 30.2K cites 10 years ago