Knowledge distillation: A survey

J Gou, B Yu, SJ Maybank, D Tao - International Journal of Computer Vision, 2021 - Springer
… A general teacherstudent framework for knowledge distillation is shown in Fig. 1. … different
categories of knowledge for knowledge distillation. A vanilla knowledge distillation uses the …

Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks

L Wang, KJ Yoon - IEEE transactions on pattern analysis and …, 2021 - ieeexplore.ieee.org
… is to form a better teacher model from the student without additional … In such a situation,
the knowledge from the input … This paper is about knowledge distillation (KD) and studentteacher

Improved knowledge distillation via teacher assistant

SI Mirzadeh, M Farajtabar, A Li, N Levine… - Proceedings of the AAAI …, 2020 - aaai.org
… network and a pre-trained large one as a teacher, both fixed and (wrongly) presumed to …
we propose a new distillation framework called Teacher Assistant Knowledge Distillation (TAKD)…

Modeling teacher-student techniques in deep neural networks for knowledge distillation

S Abbasi, M Hajabdollahi, N Karimi… - … on Machine Vision …, 2020 - ieeexplore.ieee.org
Knowledge distillation (KD) is a new method for transferring knowledge of a structure under …
small model (named as a student) by soft labels produced by a complex model (named as a …

On the efficacy of knowledge distillation

JH Cho, B Hariharan - Proceedings of the IEEE/CVF …, 2019 - openaccess.thecvf.com
teachers often don’t make good teachers, we attempt to tease apart the factors that affect
knowledge distillation … We find crucially that larger models do not often make better teachers. …

Knowledge distillation from a stronger teacher

T Huang, S You, F Wang, C Qian… - Advances in Neural …, 2022 - proceedings.neurips.cc
… settings, where the teacher models and training … distill better from a stronger teacher. We
empirically find that the discrepancy of predictions between the student and a stronger teacher

[PDF][PDF] Efficient Knowledge Distillation from an Ensemble of Teachers.

T Fukuda, M Suzuki, G Kurata, S Thomas, J Cui… - Interspeech, 2017 - isca-archive.org
knowledge distillation using teacher student training for building accurate and compact neural
networks. We show that with knowledge … multiple teacher labels for training student models…

Knowledge distillation with the reused teacher classifier

D Chen, JP Mei, H Zhang, C Wang… - Proceedings of the …, 2022 - openaccess.thecvf.com
… In this paper, we present a simple knowledge distillation technique and demonstrate that it
… between teacher and student models with no need for elaborate knowledge representations. …

Student customized knowledge distillation: Bridging the gap between student and teacher

Y Zhu, Y Wang - Proceedings of the IEEE/CVF International …, 2021 - openaccess.thecvf.com
teachers do not make better students due to the capacity mismatch. To this end, we present
a novel adaptive knowledge distillation … as Student Customized Knowledge Distillation (…

Learning student-friendly teacher networks for knowledge distillation

DY Park, MH Cha, D Kim, B Han - Advances in neural …, 2021 - proceedings.neurips.cc
knowledge distillation approach to facilitate the transfer of dark knowledge from a teacher to
a student. … effective training of student models given pretrained teachers, we aim to learn the …