T Peng, J Zhang - … of the 31st International Conference on …, 2025 - aclanthology.org
Abstract Knowledge distillation (KD) is an effective model compression method that can
transfer the internal capabilities of large language models (LLMs) to smaller ones. However …