相关文章- 学术资源搜索

Knowledge distillation: A survey

J Gou, B Yu, SJ Maybank, D Tao - International Journal of Computer Vision, 2021 - Springer

In recent years, deep neural networks have been successful in both industry and academia,
especially for computer vision tasks. The great success of deep learning is mainly due to its …

被引用次数：2335 相关文章所有 11 个版本

Hierarchical multi-attention transfer for knowledge distillation

J Gou, L Sun, B Yu, S Wan, D Tao - ACM Transactions on Multimedia …, 2023 - dl.acm.org

Knowledge distillation (KD) is a powerful and widely applicable technique for the
compression of deep learning models. The main idea of knowledge distillation is to transfer …

被引用次数：32 相关文章

[HTML] peerj.com

[HTML][HTML] Knowledge distillation in deep learning and its applications

A Alkhulaifi, F Alsahli, I Ahmad - PeerJ Computer Science, 2021 - peerj.com

Deep learning based models are relatively large, and it is hard to deploy such models on
resource-limited devices such as mobile phones and embedded devices. One possible …

被引用次数：60 相关文章所有 12 个版本

[PDF] qmul.ac.uk

[PDF][PDF] Knowledge distillation via softmax regression representation learning

J Yang, B Martinez, A Bulat, G Tzimiropoulos - 2021 - qmro.qmul.ac.uk

This paper addresses the problem of model compression via knowledge distillation. We
advocate for a method that optimizes the output feature of the penultimate layer of the …

被引用次数：129 相关文章所有 4 个版本

[PDF] thecvf.com

Knowledge distillation with the reused teacher classifier

D Chen, JP Mei, H Zhang, C Wang… - Proceedings of the …, 2022 - openaccess.thecvf.com

Abstract Knowledge distillation aims to compress a powerful yet cumbersome teacher model
into a lightweight student model without much sacrifice of performance. For this purpose …

被引用次数：144 相关文章所有 8 个版本

[PDF] neurips.cc

Teach less, learn more: On the undistillable classes in knowledge distillation

Y Zhu, N Liu, Z Xu, X Liu, W Meng… - Advances in …, 2022 - proceedings.neurips.cc

Abstract Knowledge distillation (KD) can effectively compress neural networks by training a
smaller network (student) to simulate the behavior of a larger one (teacher). A counter …

被引用次数：15 相关文章所有 3 个版本

[PDF] arxiv.org

Annealing knowledge distillation

A Jafari, M Rezagholizadeh, P Sharma… - arXiv preprint arXiv …, 2021 - arxiv.org

Significant memory and computational requirements of large deep neural networks restrict
their application on edge devices. Knowledge distillation (KD) is a prominent model …

被引用次数：63 相关文章所有 3 个版本

[PDF] aaai.org

Alp-kd: Attention-based layer projection for knowledge distillation

P Passban, Y Wu, M Rezagholizadeh… - Proceedings of the AAAI …, 2021 - ojs.aaai.org

Abstract Knowledge distillation is considered as a training and compression strategy in
which two neural networks, namely a teacher and a student, are coupled together during …

被引用次数：101 相关文章所有 5 个版本

[PDF] arxiv.org

Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks

L Wang, KJ Yoon - IEEE transactions on pattern analysis and …, 2021 - ieeexplore.ieee.org

Deep neural models, in recent years, have been successful in almost every field, even
solving the most complex problem statements. However, these models are huge in size with …

被引用次数：617 相关文章所有 10 个版本

[PDF] arxiv.org

Prune your model before distill it

J Park, A No - European Conference on Computer Vision, 2022 - Springer

Abstract Knowledge distillation transfers the knowledge from a cumbersome teacher to a
small student. Recent results suggest that the student-friendly teacher is more appropriate to …

被引用次数：26 相关文章所有 6 个版本

高级搜索

QQ 群

Knowledge distillation: A survey

Hierarchical multi-attention transfer for knowledge distillation

[HTML][HTML] Knowledge distillation in deep learning and its applications

[PDF][PDF] Knowledge distillation via softmax regression representation learning

Knowledge distillation with the reused teacher classifier

Teach less, learn more: On the undistillable classes in knowledge distillation

Annealing knowledge distillation

Alp-kd: Attention-based layer projection for knowledge distillation

Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks

Prune your model before distill it

相关搜索

引用