Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy

November 15, 2017 ยท Declared Dead ยท ๐Ÿ› International Conference on Learning Representations

๐Ÿ‘ป CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Asit Mishra, Debbie Marr arXiv ID 1711.05852 Category cs.LG: Machine Learning Cross-listed cs.CV, cs.NE Citations 349 Venue International Conference on Learning Representations Last Checked 3 months ago
Abstract
Deep learning networks have achieved state-of-the-art accuracies on computer vision workloads like image classification and object detection. The performant systems, however, typically involve big models with numerous parameters. Once trained, a challenging aspect for such top performing models is deployment on resource constrained inference systems - the models (often deep networks or wide networks or both) are compute and memory intensive. Low-precision numerics and model compression using knowledge distillation are popular techniques to lower both the compute requirements and memory footprint of these deployed models. In this paper, we study the combination of these two techniques and show that the performance of low-precision networks can be significantly improved by using knowledge distillation techniques. Our approach, Apprentice, achieves state-of-the-art accuracies using ternary precision and 4-bit precision for variants of ResNet architecture on ImageNet dataset. We present three schemes using which one can apply knowledge distillation techniques to various stages of the train-and-deploy pipeline.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Machine Learning

Died the same way โ€” ๐Ÿ‘ป Ghosted