Knowledge distillation: A survey

J Gou, B Yu, SJ Maybank, D Tao - International Journal of Computer Vision, 2021 - Springer
In recent years, deep neural networks have been successful in both industry and academia,
especially for computer vision tasks. The great success of deep learning is mainly due to its …

Linkless link prediction via relational distillation

Z Guo, W Shiao, S Zhang, Y Liu… - International …, 2023 - proceedings.mlr.press
Abstract Graph Neural Networks (GNNs) have shown exceptional performance in the task of
link prediction. Despite their effectiveness, the high latency brought by non-trivial …

Asymmetric temperature scaling makes larger networks teach well again

XC Li, WS Fan, S Song, Y Li… - Advances in neural …, 2022 - proceedings.neurips.cc
Abstract Knowledge Distillation (KD) aims at transferring the knowledge of a well-performed
neural network (the {\it teacher}) to a weaker one (the {\it student}). A peculiar phenomenon …

[PDF][PDF] 深度学习中知识蒸馏研究综述

邵仁荣, 刘宇昂, 张伟, 王骏 - 计算机学报, 2022 - 159.226.43.17
摘要在人工智能迅速发展的今天, 深度神经网络广泛应用于各个研究领域并取得了巨大的成功,
但也同样面临着诸多挑战. 首先, 为了解决复杂的问题和提高模型的训练效果 …

Teach less, learn more: On the undistillable classes in knowledge distillation

Y Zhu, N Liu, Z Xu, X Liu, W Meng… - Advances in …, 2022 - proceedings.neurips.cc
Abstract Knowledge distillation (KD) can effectively compress neural networks by training a
smaller network (student) to simulate the behavior of a larger one (teacher). A counter …

DOT: A Distillation-Oriented Trainer

B Zhao, Q Cui, R Song, J Liang - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Abstract Knowledge distillation transfers knowledge from a large model to a small one via
task and distillation losses. In this paper, we observe a trade-off between task and distillation …

Weakly supervised semantic segmentation via alternate self-dual teaching

D Zhang, H Li, W Zeng, C Fang… - … on Image Processing, 2023 - ieeexplore.ieee.org
Weakly supervised semantic segmentation (WSSS) is a challenging yet important research
field in vision community. In WSSS, the key problem is to generate high-quality pseudo …

Do not blindly imitate the teacher: Using perturbed loss for knowledge distillation

R Zhang, J Shen, T Liu, J Liu, M Bendersky… - arXiv preprint arXiv …, 2023 - arxiv.org
Knowledge distillation is a popular technique to transfer knowledge from large teacher
models to a small student model. Typically, the student learns to imitate the teacher by …

Teacher's pet: understanding and mitigating biases in distillation

M Lukasik, S Bhojanapalli, AK Menon… - arXiv preprint arXiv …, 2021 - arxiv.org
Knowledge distillation is widely used as a means of improving the performance of a
relatively simple student model using the predictions from a complex teacher model. Several …

Learning from biased soft labels

H Yuan, Y Shi, N Xu, X Yang… - Advances in Neural …, 2024 - proceedings.neurips.cc
Since the advent of knowledge distillation, many researchers have been intrigued by the
$\textit {dark knowledge} $ hidden in the soft labels generated by the teacher model. This …