Class attention transfer based knowledge distillation

S Sun, W Ren, J Li, R Wang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Abstract Knowledge distillation involves transferring soft labels from a teacher to a student
using a shared temperature-based softmax function. However the assumption of a shared …

被引用次数：9 相关文章所有 6 个版本

Reciprocal Teacher-Student Learning via Forward and Feedback Knowledge Distillation

J Gou, Y Chen, B Yu, J Liu, L Du… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Knowledge distillation (KD) is a prevalent model compression technique in deep learning,
aiming to leverage knowledge from a large teacher model to enhance the training of a …

被引用次数：2 相关文章

Efficient crowd counting via dual knowledge distillation

R Wang, Y Hao, L Hu, X Li, M Chen… - … on Image Processing, 2023 - ieeexplore.ieee.org

Most researchers focus on designing accurate crowd counting models with heavy
parameters and computations but ignore the resource burden during the model deployment …

被引用次数：2 相关文章所有 5 个版本

[PDF] arxiv.org

Instance Temperature Knowledge Distillation

Z Zhang, Y Zhou, J Gong, J Liu, Z Tu - arXiv preprint arXiv:2407.00115, 2024 - arxiv.org

Knowledge distillation (KD) enhances the performance of a student network by allowing it to
learn the knowledge transferred from a teacher network incrementally. Existing methods …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Good Teachers Explain: Explanation-Enhanced Knowledge Distillation

A Parchami-Araghi, M Böhle, S Rao… - arXiv preprint arXiv …, 2024 - arxiv.org

Knowledge Distillation (KD) has proven effective for compressing large teacher models into
smaller student models. While it is well known that student models can achieve similar …

被引用次数：1 相关文章所有 4 个版本

高级搜索

QQ 群