相关文章- 学术资源搜索

Knowledge distillation from a stronger teacher

T Huang, S You, F Wang, C Qian… - Advances in Neural …, 2022 - proceedings.neurips.cc

Unlike existing knowledge distillation methods focus on the baseline settings, where the
teacher models and training strategies are not that strong and competing as state-of-the-art …

被引用次数：249 相关文章所有 6 个版本

[PDF] thecvf.com

Automated knowledge distillation via monte carlo tree search

L Li, P Dong, Z Wei, Y Yang - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

In this paper, we present Auto-KD, the first automated search framework for optimal
knowledge distillation design. Traditional distillation techniques typically require handcrafted …

被引用次数：38 相关文章所有 3 个版本

[PDF] thecvf.com

Student customized knowledge distillation: Bridging the gap between student and teacher

Y Zhu, Y Wang - Proceedings of the IEEE/CVF International …, 2021 - openaccess.thecvf.com

Abstract Knowledge distillation (KD) transfers the dark knowledge from cumbersome
networks (teacher) to lightweight (student) networks and expects the student to achieve …

被引用次数：106 相关文章所有 3 个版本

[PDF] neurips.cc

Teach less, learn more: On the undistillable classes in knowledge distillation

Y Zhu, N Liu, Z Xu, X Liu, W Meng… - Advances in …, 2022 - proceedings.neurips.cc

Abstract Knowledge distillation (KD) can effectively compress neural networks by training a
smaller network (student) to simulate the behavior of a larger one (teacher). A counter …

被引用次数：29 相关文章所有 4 个版本

[PDF] thecvf.com

Decoupled knowledge distillation

B Zhao, Q Cui, R Song, Y Qiu… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

State-of-the-art distillation methods are mainly based on distilling deep features from
intermediate layers, while the significance of logit distillation is greatly overlooked. To …

被引用次数：764 相关文章所有 5 个版本

[PDF] neurips.cc

One-for-all: Bridge the gap between heterogeneous architectures in knowledge distillation

Z Hao, J Guo, K Han, Y Tang, H Hu… - Advances in Neural …, 2024 - proceedings.neurips.cc

Abstract Knowledge distillation (KD) has proven to be a highly effective approach for
enhancing model performance through a teacher-student training scheme. However, most …

被引用次数：45 相关文章所有 5 个版本

[PDF] aaai.org

Alp-kd: Attention-based layer projection for knowledge distillation

P Passban, Y Wu, M Rezagholizadeh… - Proceedings of the AAAI …, 2021 - ojs.aaai.org

Abstract Knowledge distillation is considered as a training and compression strategy in
which two neural networks, namely a teacher and a student, are coupled together during …

被引用次数：128 相关文章所有 4 个版本

[PDF] neurips.cc

Knowledge diffusion for distillation

T Huang, Y Zhang, M Zheng, S You… - Advances in …, 2023 - proceedings.neurips.cc

The representation gap between teacher and student is an emerging topic in knowledge
distillation (KD). To reduce the gap and improve the performance, current methods often …

被引用次数：55 相关文章所有 5 个版本

[PDF] thecvf.com

Class attention transfer based knowledge distillation

Z Guo, H Yan, H Li, X Lin - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com

Previous knowledge distillation methods have shown their impressive performance on
model compression tasks, however, it is hard to explain how the knowledge they transferred …

被引用次数：70 相关文章所有 5 个版本

[PDF] thecvf.com

Lipschitz continuity guided knowledge distillation

Y Shang, B Duan, Z Zong, L Nie… - Proceedings of the …, 2021 - openaccess.thecvf.com

Abstract Knowledge distillation has become one of the most important model compression
techniques by distilling knowledge from larger teacher networks to smaller student ones …

被引用次数：37 相关文章所有 5 个版本

高级搜索

QQ 群

Knowledge distillation from a stronger teacher

Automated knowledge distillation via monte carlo tree search

Student customized knowledge distillation: Bridging the gap between student and teacher

Teach less, learn more: On the undistillable classes in knowledge distillation

Decoupled knowledge distillation

One-for-all: Bridge the gap between heterogeneous architectures in knowledge distillation

Alp-kd: Attention-based layer projection for knowledge distillation

Knowledge diffusion for distillation

Class attention transfer based knowledge distillation

Lipschitz continuity guided knowledge distillation

相关搜索

引用