Teaching What You Should Teach: A Data-Based Distillation Method

December 11, 2022 · Declared Dead · 🏛 International Joint Conference on Artificial Intelligence

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Shitong Shao, Huanran Chen, Zhen Huang, Linrui Gong, Shuai Wang, Xinxiao Wu arXiv ID 2212.05422 Category cs.CV: Computer Vision Citations 6 Venue International Joint Conference on Artificial Intelligence Last Checked 3 months ago

Abstract

In real teaching scenarios, an excellent teacher always teaches what he (or she) is good at but the student is not. This gives the student the best assistance in making up for his (or her) weaknesses and becoming a good one overall. Enlightened by this, we introduce the "Teaching what you Should Teach" strategy into a knowledge distillation framework, and propose a data-based distillation method named "TST" that searches for desirable augmented samples to assist in distilling more efficiently and rationally. To be specific, we design a neural network-based data augmentation module with priori bias, which assists in finding what meets the teacher's strengths but the student's weaknesses, by learning magnitudes and probabilities to generate suitable data samples. By training the data augmentation module and the generalized distillation paradigm in turn, a student model is learned with excellent generalization ability. To verify the effectiveness of our method, we conducted extensive comparative experiments on object recognition, detection, and segmentation tasks. The results on the CIFAR-10, ImageNet-1k, MS-COCO, and Cityscapes datasets demonstrate that our method achieves state-of-the-art performance on almost all teacher-student pairs. Furthermore, we conduct visualization studies to explore what magnitudes and probabilities are needed for the distillation process.