相关文章- 学术资源搜索

Comparing kullback-leibler divergence and mean squared error loss in knowledge distillation

T Kim, J Oh, NY Kim, S Cho, SY Yun - arXiv preprint arXiv:2105.08919, 2021 - arxiv.org

Knowledge distillation (KD), transferring knowledge from a cumbersome teacher model to a
lightweight student model, has been investigated to design efficient neural architectures …

被引用次数：238 相关文章所有 4 个版本

[PDF] neurips.cc

What makes a" good" data augmentation in knowledge distillation-a statistical perspective

H Wang, S Lohit, MN Jones… - Advances in Neural …, 2022 - proceedings.neurips.cc

Abstract Knowledge distillation (KD) is a general neural network training approach that uses
a teacher model to guide the student model. Existing works mainly study KD from the …

被引用次数：48 相关文章所有 9 个版本

[PDF] nsf.gov

Rethinking soft labels for knowledge distillation: A bias-variance tradeoff perspective

H Zhou, L Song, J Chen, Y Zhou, G Wang… - arXiv preprint arXiv …, 2021 - arxiv.org

Knowledge distillation is an effective approach to leverage a well-trained network or an
ensemble of them, named as the teacher, to guide the training of a student network. The …

被引用次数：184 相关文章所有 5 个版本

[PDF] arxiv.org

Preparing lessons: Improve knowledge distillation with better supervision

T Wen, S Lai, X Qian - Neurocomputing, 2021 - Elsevier

Abstract Knowledge distillation (KD) is widely applied in the training of efficient neural
network. A compact model, which is trained to mimic the representation of a cumbersome …

被引用次数：66 相关文章所有 3 个版本

[PDF] neurips.cc

Asymmetric temperature scaling makes larger networks teach well again

XC Li, WS Fan, S Song, Y Li… - Advances in neural …, 2022 - proceedings.neurips.cc

Abstract Knowledge Distillation (KD) aims at transferring the knowledge of a well-performed
neural network (the {\it teacher}) to a weaker one (the {\it student}). A peculiar phenomenon …

被引用次数：30 相关文章所有 9 个版本

[PDF] neurips.cc

Teach less, learn more: On the undistillable classes in knowledge distillation

Y Zhu, N Liu, Z Xu, X Liu, W Meng… - Advances in …, 2022 - proceedings.neurips.cc

Abstract Knowledge distillation (KD) can effectively compress neural networks by training a
smaller network (student) to simulate the behavior of a larger one (teacher). A counter …

被引用次数：29 相关文章所有 4 个版本

[PDF] arxiv.org

Understanding and improving knowledge distillation

J Tang, R Shivanna, Z Zhao, D Lin, A Singh… - arXiv preprint arXiv …, 2020 - arxiv.org

Knowledge Distillation (KD) is a model-agnostic technique to improve model quality while
having a fixed capacity budget. It is a commonly used technique for model compression …

被引用次数：146 相关文章所有 3 个版本

[PDF] neurips.cc

One-for-all: Bridge the gap between heterogeneous architectures in knowledge distillation

Z Hao, J Guo, K Han, Y Tang, H Hu… - Advances in Neural …, 2024 - proceedings.neurips.cc

Abstract Knowledge distillation (KD) has proven to be a highly effective approach for
enhancing model performance through a teacher-student training scheme. However, most …

被引用次数：45 相关文章所有 5 个版本

[PDF] arxiv.org

Knowledge distillation beyond model compression

F Sarfraz, E Arani, B Zonooz - 2020 25th International …, 2021 - ieeexplore.ieee.org

Knowledge distillation (KD) is commonly deemed as an effective model compression
technique in which a compact model (student) is trained under the supervision of a larger …

被引用次数：50 相关文章所有 9 个版本

[PDF] thecvf.com

Logit standardization in knowledge distillation

S Sun, W Ren, J Li, R Wang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Abstract Knowledge distillation involves transferring soft labels from a teacher to a student
using a shared temperature-based softmax function. However the assumption of a shared …

被引用次数：68 相关文章所有 6 个版本

高级搜索

QQ 群

Comparing kullback-leibler divergence and mean squared error loss in knowledge distillation

What makes a" good" data augmentation in knowledge distillation-a statistical perspective

Rethinking soft labels for knowledge distillation: A bias-variance tradeoff perspective

Preparing lessons: Improve knowledge distillation with better supervision

Asymmetric temperature scaling makes larger networks teach well again

Teach less, learn more: On the undistillable classes in knowledge distillation

Understanding and improving knowledge distillation

One-for-all: Bridge the gap between heterogeneous architectures in knowledge distillation

Knowledge distillation beyond model compression

Logit standardization in knowledge distillation

相关搜索

引用