Wasserstein Distance Rivals Kullback-Leibler Divergence for Knowledge Distillation

J Lv, H Yang, P Li - arXiv preprint arXiv:2412.08139, 2024 - arxiv.org
Since pioneering work of Hinton et al., knowledge distillation based on Kullback-Leibler
Divergence (KL-Div) has been predominant, and recently its variants have achieved …

Wasserstein Distance Rivals Kullback-Leibler Divergence for Knowledge Distillation

J Lv, H Yang, P Li - The Thirty-eighth Annual Conference on Neural … - openreview.net
Since pioneering work of Hinton et al., knowledge distillation based on Kullback-Leibler
Divergence (KL-Div) has been predominant, and recently its variants have achieved …