Enhancing Knowledge Distillation of Large Language Models through Efficient Multi-Modal Distribution Alignment

T Peng, J Zhang - arXiv preprint arXiv:2409.12545, 2024 - arxiv.org
Knowledge distillation (KD) is an effective model compression method that can transfer the
internal capabilities of large language models (LLMs) to smaller ones. However, the multi …

Enhancing Knowledge Distillation of Large Language Models through Efficient Multi-Modal Distribution Alignment

T Peng, J Zhang - arXiv e-prints, 2024 - ui.adsabs.harvard.edu
Abstract Knowledge distillation (KD) is an effective model compression method that can
transfer the internal capabilities of large language models (LLMs) to smaller ones. However …

Enhancing Knowledge Distillation of Large Language Models through Efficient Multi-Modal Distribution Alignment

T Peng, J Zhang - … of the 31st International Conference on …, 2025 - aclanthology.org
Abstract Knowledge distillation (KD) is an effective model compression method that can
transfer the internal capabilities of large language models (LLMs) to smaller ones. However …