Rethinking kullback-leibler divergence in knowledge distillation for large language models

T Wu, C Tao, J Wang, R Yang, Z Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org
Kullback-Leiber divergence has been widely used in Knowledge Distillation (KD) to
compress Large Language Models (LLMs). Contrary to prior assertions that reverse …

Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models

T Wu, C Tao, J Wang, Z Zhao, N Wong - arXiv e-prints, 2024 - ui.adsabs.harvard.edu
Kullback-Leiber divergence has been widely used in Knowledge Distillation (KD) to
compress Large Language Models (LLMs). Contrary to prior assertions that reverse …