Llm-neo: Parameter efficient knowledge distillation for large language models

R Yang, T Wu, J Wang, P Hu, N Wong… - arXiv preprint arXiv …, 2024 - arxiv.org
In this paper, we propose a novel LLM-Neo framework that efficiently transfers knowledge
from a large language model (LLM) teacher to a compact student. Initially, we revisit the …

Ddk: Distilling domain knowledge for efficient large language models

J Liu, C Zhang, J Guo, Y Zhang, H Que, K Deng… - arXiv preprint arXiv …, 2024 - arxiv.org
Despite the advanced intelligence abilities of large language models (LLMs) in various
applications, they still face significant computational and storage demands. Knowledge …

Enhancing Knowledge Distillation for LLMs with Response-Priming Prompting

V Goyal, M Khan, A Tirupati, H Saini, M Lam… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have demonstrated remarkable performance across a wide
range of natural language processing (NLP) tasks. However, these models are often difficult …

Knowledge distillation of large language models

Y Gu, L Dong, F Wei, M Huang - arXiv preprint arXiv:2306.08543, 2023 - arxiv.org
Knowledge Distillation (KD) is a promising technique for reducing the high computational
demand of large language models (LLMs). However, previous KD methods are primarily …

Dual-Space Knowledge Distillation for Large Language Models

S Zhang, X Zhang, Z Sun, Y Chen, J Xu - arXiv preprint arXiv:2406.17328, 2024 - arxiv.org
Knowledge distillation (KD) is known as a promising solution to compress large language
models (LLMs) via transferring their knowledge to smaller models. During this process, white …

PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning

G Kim, D Jang, E Yang - arXiv preprint arXiv:2402.12842, 2024 - arxiv.org
Recent advancements in large language models (LLMs) have raised concerns about
inference costs, increasing the need for research into model compression. While knowledge …

Pre-training Distillation for Large Language Models: A Design Space Exploration

H Peng, X Lv, Y Bai, Z Yao, J Zhang, L Hou… - arXiv preprint arXiv …, 2024 - arxiv.org
Knowledge distillation (KD) aims to transfer knowledge from a large teacher model to a
smaller student model. Previous work applying KD in the field of large language models …

MiniLLM: Knowledge distillation of large language models

Y Gu, L Dong, F Wei, M Huang - The Twelfth International …, 2024 - openreview.net
Knowledge Distillation (KD) is a promising technique for reducing the high computational
demand of large language models (LLMs). However, previous KD methods are primarily …

MiniPLM: Knowledge Distillation for Pre-Training Language Models

Y Gu, H Zhou, F Meng, J Zhou, M Huang - arXiv preprint arXiv:2410.17215, 2024 - arxiv.org
Knowledge distillation (KD) is widely used to train small, high-performing student language
models (LMs) using large teacher LMs. While effective in fine-tuning, KD during pre-training …

Sparse Mixture of Experts Language Models Excel in Knowledge Distillation

H Xu, H Liu, W Gong, X Deng, H Wang - CCF International Conference on …, 2024 - Springer
Abstract Knowledge distillation is an effective method for reducing the computational
overhead of large language models. However, recent optimization efforts in distilling large …