Comparing kullback-leibler divergence and mean squared error loss in knowledge distillation

T Kim, J Oh, NY Kim, S Cho, SY Yun - arXiv preprint arXiv:2105.08919, 2021 - arxiv.org
Knowledge distillation (KD), transferring knowledge from a cumbersome teacher model to a
lightweight student model, has been investigated to design efficient neural architectures …

[PDF][PDF] Comparing Kullback-Leibler Divergence and Mean Squared Error Loss in Knowledge Distillation

T Kim, J Oh, NY Kim, S Cho, SY Yun - scholar.archive.org
Abstract Knowledge distillation (KD), transferring knowledge from a cumbersome teacher
model to a lightweight student model, has been investigated to design efficient neural …

Comparing Kullback-Leibler Divergence and Mean Squared Error Loss in Knowledge Distillation

T Kim, J Oh, NY Kim, S Cho, SY Yun - arXiv e-prints, 2021 - ui.adsabs.harvard.edu
Abstract Knowledge distillation (KD), transferring knowledge from a cumbersome teacher
model to a lightweight student model, has been investigated to design efficient neural …

[PDF][PDF] Comparing Kullback-Leibler Divergence and Mean Squared Error Loss in Knowledge Distillation

T Kim, J Oh, NY Kim, S Cho, SY Yun - ijcai.org
Abstract Knowledge distillation (KD), transferring knowledge from a cumbersome teacher
model to a lightweight student model, has been investigated to design efficient neural …