minimal loss knowledge distillation- 学术资源搜索

Similarity-preserving knowledge distillation

F Tung, G Mori - Proceedings of the IEEE/CVF international …, 2019 - openaccess.thecvf.com

… to compress large networks into more resource-efficient ones with minimal accuracy loss. In
… We presented in this paper a novel distillation loss for capturing and transferring knowledge …

被引用次数：1067 相关文章所有 7 个版本

[PDF] neurips.cc

Knowledge distillation from a stronger teacher

T Huang, S You, F Wang, C Qian… - Advances in Neural …, 2022 - proceedings.neurips.cc

… misalignment between KD loss and classification loss would be severer, thus disturbing the
student’s training. As a result, the exact match (ie, the loss reaches the minimal if and only if …

被引用次数：147 相关文章所有 6 个版本

[PDF] thecvf.com

Generalization Matters: Loss Minima Flattening via Parameter Hybridization for Efficient Online Knowledge Distillation

T Zhang, M Xue, J Zhang, H Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

… generalizable solution by weighted averaging the local minimum located in the border of
areas with lower loss. Considering the relationship between the loss landscape’s geometry and …

被引用次数：2 相关文章所有 6 个版本

[PDF] thecvf.com

Knowledge distillation: A good teacher is patient and consistent

L Beyer, X Zhai, A Royer, L Markeeva… - Proceedings of the …, 2022 - openaccess.thecvf.com

… knowledge distillation approach which does not suffer from these drawbacks. The idea behind
knowledge distillation is to “distill” … We closely follow the original distillation setup from [12] …

被引用次数：259 相关文章所有 7 个版本

[PDF] thecvf.com

Knowledge distillation via route constrained optimization

X Jin, B Peng, Y Wu, Y Liu, J Liu… - Proceedings of the …, 2019 - openaccess.thecvf.com

… Besides the visualization of optimization trajectory, we also observe that the new local
minimum has better generalization capacity and is more robust to random noise in input space. …

被引用次数：186 相关文章所有 5 个版本

[PDF] arxiv.org

Parameter-efficient and student-friendly knowledge distillation

J Rao, X Meng, L Ding, S Qi, X Liu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

… adapter module [19] for knowledge distillation with updating parameters of this adapter …
knowledge distillation methods for comparison, including Vanilla [1], knowledge distillation …

被引用次数：22 相关文章所有 4 个版本

[PDF] thecvf.com

Online knowledge distillation via collaborative learning

Q Guo, X Wang, Y Wu, Z Yu, D Liang… - Proceedings of the …, 2020 - openaccess.thecvf.com

… Loss function To improve the generalization performance, we distill the knowledge of soft …
Then a neat way to generate teacher logit is to select the minimum element of each row of …

被引用次数：301 相关文章所有 6 个版本

[PDF] neurips.cc

Teach less, learn more: On the undistillable classes in knowledge distillation

Y Zhu, N Liu, Z Xu, X Liu, W Meng… - Advances in …, 2022 - proceedings.neurips.cc

… in knowledge distillation and a surge of related literature on designing better distillation …
this problem, but few minimal efforts have been put into understanding this phenomenon. …

被引用次数：16 相关文章所有 4 个版本

[PDF] arxiv.org

Distilling spikes: Knowledge distillation in spiking neural networks

RK Kushawaha, S Kumar, B Banerjee… - 2020 25th …, 2021 - ieeexplore.ieee.org

… In this work, we propose a Knowledge distillation that allows transferring the knowledge of
the large SNN to smaller one in a disciplined fashion with minimal loss in performance. We …

被引用次数：33 相关文章所有 8 个版本

… for reducing arrhythmia classification from 12-lead ECG signals to single-lead ECG with minimal loss of accuracy through teacher-student knowledge distillation

M Sepahvand, F Abdali-Mohammadi - Information Sciences, 2022 - Elsevier

… Knowledge distillation was utilized in this paper to propose a method for bridging the gap …
Despite its simplicity, the student model receives the dark knowledge of multi-lead ECG signals …

被引用次数：53 相关文章所有 2 个版本

高级搜索

QQ 群

Similarity-preserving knowledge distillation

Knowledge distillation from a stronger teacher

Generalization Matters: Loss Minima Flattening via Parameter Hybridization for Efficient Online Knowledge Distillation

Knowledge distillation: A good teacher is patient and consistent

Knowledge distillation via route constrained optimization

Parameter-efficient and student-friendly knowledge distillation

Online knowledge distillation via collaborative learning

Teach less, learn more: On the undistillable classes in knowledge distillation

Distilling spikes: Knowledge distillation in spiking neural networks

… for reducing arrhythmia classification from 12-lead ECG signals to single-lead ECG with minimal loss of accuracy through teacher-student knowledge distillation

相关搜索

引用