Knowledge distillation: A survey

J Gou, B Yu, SJ Maybank, D Tao - International Journal of Computer Vision, 2021 - Springer
In recent years, deep neural networks have been successful in both industry and academia,
especially for computer vision tasks. The great success of deep learning is mainly due to its …

Hierarchical multi-attention transfer for knowledge distillation

J Gou, L Sun, B Yu, S Wan, D Tao - ACM Transactions on Multimedia …, 2023 - dl.acm.org
Knowledge distillation (KD) is a powerful and widely applicable technique for the
compression of deep learning models. The main idea of knowledge distillation is to transfer …

[HTML][HTML] Knowledge distillation in deep learning and its applications

A Alkhulaifi, F Alsahli, I Ahmad - PeerJ Computer Science, 2021 - peerj.com
Deep learning based models are relatively large, and it is hard to deploy such models on
resource-limited devices such as mobile phones and embedded devices. One possible …

[PDF][PDF] Knowledge distillation via softmax regression representation learning

J Yang, B Martinez, A Bulat, G Tzimiropoulos - 2021 - qmro.qmul.ac.uk
This paper addresses the problem of model compression via knowledge distillation. We
advocate for a method that optimizes the output feature of the penultimate layer of the …

Knowledge distillation with the reused teacher classifier

D Chen, JP Mei, H Zhang, C Wang… - Proceedings of the …, 2022 - openaccess.thecvf.com
Abstract Knowledge distillation aims to compress a powerful yet cumbersome teacher model
into a lightweight student model without much sacrifice of performance. For this purpose …

Teach less, learn more: On the undistillable classes in knowledge distillation

Y Zhu, N Liu, Z Xu, X Liu, W Meng… - Advances in …, 2022 - proceedings.neurips.cc
Abstract Knowledge distillation (KD) can effectively compress neural networks by training a
smaller network (student) to simulate the behavior of a larger one (teacher). A counter …

Annealing knowledge distillation

A Jafari, M Rezagholizadeh, P Sharma… - arXiv preprint arXiv …, 2021 - arxiv.org
Significant memory and computational requirements of large deep neural networks restrict
their application on edge devices. Knowledge distillation (KD) is a prominent model …

Alp-kd: Attention-based layer projection for knowledge distillation

P Passban, Y Wu, M Rezagholizadeh… - Proceedings of the AAAI …, 2021 - ojs.aaai.org
Abstract Knowledge distillation is considered as a training and compression strategy in
which two neural networks, namely a teacher and a student, are coupled together during …

Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks

L Wang, KJ Yoon - IEEE transactions on pattern analysis and …, 2021 - ieeexplore.ieee.org
Deep neural models, in recent years, have been successful in almost every field, even
solving the most complex problem statements. However, these models are huge in size with …

Prune your model before distill it

J Park, A No - European Conference on Computer Vision, 2022 - Springer
Abstract Knowledge distillation transfers the knowledge from a cumbersome teacher to a
small student. Recent results suggest that the student-friendly teacher is more appropriate to …