Knowledge distillation from a stronger teacher

T Huang, S You, F Wang, C Qian… - Advances in Neural …, 2022 - proceedings.neurips.cc
Unlike existing knowledge distillation methods focus on the baseline settings, where the
teacher models and training strategies are not that strong and competing as state-of-the-art …

Automated knowledge distillation via monte carlo tree search

L Li, P Dong, Z Wei, Y Yang - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
In this paper, we present Auto-KD, the first automated search framework for optimal
knowledge distillation design. Traditional distillation techniques typically require handcrafted …

Student customized knowledge distillation: Bridging the gap between student and teacher

Y Zhu, Y Wang - Proceedings of the IEEE/CVF International …, 2021 - openaccess.thecvf.com
Abstract Knowledge distillation (KD) transfers the dark knowledge from cumbersome
networks (teacher) to lightweight (student) networks and expects the student to achieve …

Teach less, learn more: On the undistillable classes in knowledge distillation

Y Zhu, N Liu, Z Xu, X Liu, W Meng… - Advances in …, 2022 - proceedings.neurips.cc
Abstract Knowledge distillation (KD) can effectively compress neural networks by training a
smaller network (student) to simulate the behavior of a larger one (teacher). A counter …

Decoupled knowledge distillation

B Zhao, Q Cui, R Song, Y Qiu… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
State-of-the-art distillation methods are mainly based on distilling deep features from
intermediate layers, while the significance of logit distillation is greatly overlooked. To …

One-for-all: Bridge the gap between heterogeneous architectures in knowledge distillation

Z Hao, J Guo, K Han, Y Tang, H Hu… - Advances in Neural …, 2024 - proceedings.neurips.cc
Abstract Knowledge distillation (KD) has proven to be a highly effective approach for
enhancing model performance through a teacher-student training scheme. However, most …

Alp-kd: Attention-based layer projection for knowledge distillation

P Passban, Y Wu, M Rezagholizadeh… - Proceedings of the AAAI …, 2021 - ojs.aaai.org
Abstract Knowledge distillation is considered as a training and compression strategy in
which two neural networks, namely a teacher and a student, are coupled together during …

Knowledge diffusion for distillation

T Huang, Y Zhang, M Zheng, S You… - Advances in …, 2023 - proceedings.neurips.cc
The representation gap between teacher and student is an emerging topic in knowledge
distillation (KD). To reduce the gap and improve the performance, current methods often …

Class attention transfer based knowledge distillation

Z Guo, H Yan, H Li, X Lin - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Previous knowledge distillation methods have shown their impressive performance on
model compression tasks, however, it is hard to explain how the knowledge they transferred …

Lipschitz continuity guided knowledge distillation

Y Shang, B Duan, Z Zong, L Nie… - Proceedings of the …, 2021 - openaccess.thecvf.com
Abstract Knowledge distillation has become one of the most important model compression
techniques by distilling knowledge from larger teacher networks to smaller student ones …