Knowledge distillation in wide neural networks: Risk bound, data efficiency and imperfect teacher

J Gou, B Yu, SJ Maybank, D Tao - International Journal of Computer Vision, 2021 - Springer

In recent years, deep neural networks have been successful in both industry and academia,
especially for computer vision tasks. The great success of deep learning is mainly due to its …

被引用次数：3202 相关文章所有 12 个版本

[PDF] mlr.press

Linkless link prediction via relational distillation

Z Guo, W Shiao, S Zhang, Y Liu… - International …, 2023 - proceedings.mlr.press

Abstract Graph Neural Networks (GNNs) have shown exceptional performance in the task of
link prediction. Despite their effectiveness, the high latency brought by non-trivial …

被引用次数：51 相关文章所有 8 个版本

[PDF] neurips.cc

Asymmetric temperature scaling makes larger networks teach well again

XC Li, WS Fan, S Song, Y Li… - Advances in neural …, 2022 - proceedings.neurips.cc

Abstract Knowledge Distillation (KD) aims at transferring the knowledge of a well-performed
neural network (the {\it teacher}) to a weaker one (the {\it student}). A peculiar phenomenon …

被引用次数：29 相关文章所有 9 个版本

[PDF] 159.226.43.17

[PDF][PDF] 深度学习中知识蒸馏研究综述

邵仁荣，刘宇昂，张伟，王骏 - 计算机学报, 2022 - 159.226.43.17

摘要在人工智能迅速发展的今天, 深度神经网络广泛应用于各个研究领域并取得了巨大的成功,
但也同样面临着诸多挑战. 首先, 为了解决复杂的问题和提高模型的训练效果 …

被引用次数：16 相关文章所有 5 个版本

[PDF] neurips.cc

Teach less, learn more: On the undistillable classes in knowledge distillation

Y Zhu, N Liu, Z Xu, X Liu, W Meng… - Advances in …, 2022 - proceedings.neurips.cc

Abstract Knowledge distillation (KD) can effectively compress neural networks by training a
smaller network (student) to simulate the behavior of a larger one (teacher). A counter …

被引用次数：26 相关文章所有 4 个版本

[PDF] thecvf.com

DOT: A Distillation-Oriented Trainer

B Zhao, Q Cui, R Song, J Liang - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Abstract Knowledge distillation transfers knowledge from a large model to a small one via
task and distillation losses. In this paper, we observe a trade-off between task and distillation …

被引用次数：6 相关文章所有 5 个版本

[PDF] arxiv.org

Weakly supervised semantic segmentation via alternate self-dual teaching

D Zhang, H Li, W Zeng, C Fang… - … on Image Processing, 2023 - ieeexplore.ieee.org

Weakly supervised semantic segmentation (WSSS) is a challenging yet important research
field in vision community. In WSSS, the key problem is to generate high-quality pseudo …

被引用次数：36 相关文章所有 6 个版本

[PDF] arxiv.org

Do not blindly imitate the teacher: Using perturbed loss for knowledge distillation

R Zhang, J Shen, T Liu, J Liu, M Bendersky… - arXiv preprint arXiv …, 2023 - arxiv.org

Knowledge distillation is a popular technique to transfer knowledge from large teacher
models to a small student model. Typically, the student learns to imitate the teacher by …

被引用次数：13 相关文章所有 3 个版本

[PDF] arxiv.org

Teacher's pet: understanding and mitigating biases in distillation

M Lukasik, S Bhojanapalli, AK Menon… - arXiv preprint arXiv …, 2021 - arxiv.org

Knowledge distillation is widely used as a means of improving the performance of a
relatively simple student model using the predictions from a complex teacher model. Several …

被引用次数：30 相关文章所有 6 个版本

[PDF] neurips.cc

Learning from biased soft labels

H Yuan, Y Shi, N Xu, X Yang… - Advances in Neural …, 2024 - proceedings.neurips.cc

Since the advent of knowledge distillation, many researchers have been intrigued by the
$\textit {dark knowledge} $ hidden in the soft labels generated by the teacher model. This …

被引用次数：8 相关文章所有 6 个版本

高级搜索

QQ 群