相关文章- 学术资源搜索

SWITCH: Studying with Teacher for Knowledge Distillation of Large Language Models

J Koo, Y Hwang, Y Kim, T Kang, H Bae… - arXiv preprint arXiv …, 2024 - arxiv.org

Despite the success of Large Language Models (LLMs), they still face challenges related to
high inference costs and memory requirements. To address these issues, Knowledge …

[PDF] arxiv.org

Revisiting knowledge distillation for autoregressive language models

Q Zhong, L Ding, L Shen, J Liu, B Du, D Tao - arXiv preprint arXiv …, 2024 - arxiv.org

Knowledge distillation (KD) is a common approach to compress a teacher model to reduce
its inference cost and memory footprint, by training a smaller student model. However, in the …

被引用次数：11 相关文章所有 2 个版本

[PDF] arxiv.org

Mixkd: Towards efficient distillation of large-scale language models

KJ Liang, W Hao, D Shen, Y Zhou, W Chen… - arXiv preprint arXiv …, 2020 - arxiv.org

Large-scale language models have recently demonstrated impressive empirical
performance. Nevertheless, the improved results are attained at the price of bigger models …

被引用次数：78 相关文章所有 8 个版本

[PDF] openreview.net

On-policy distillation of language models: Learning from self-generated mistakes

R Agarwal, N Vieillard, Y Zhou, P Stanczyk… - The Twelfth …, 2024 - openreview.net

Knowledge distillation (KD) is widely used for compressing a teacher model to reduce its
inference cost and memory footprint, by training a smaller student model. However, current …

被引用次数：70 相关文章所有 2 个版本

[PDF] arxiv.org

Distillm: Towards streamlined distillation for large language models

J Ko, S Kim, T Chen, SY Yun - arXiv preprint arXiv:2402.03898, 2024 - arxiv.org

Knowledge distillation (KD) is widely used for compressing a teacher model to a smaller
student model, reducing its inference cost and memory footprint while preserving model …

被引用次数：26 相关文章所有 3 个版本

[PDF] arxiv.org

MiniPLM: Knowledge Distillation for Pre-Training Language Models

Y Gu, H Zhou, F Meng, J Zhou, M Huang - arXiv preprint arXiv:2410.17215, 2024 - arxiv.org

Knowledge distillation (KD) is widely used to train small, high-performing student language
models (LMs) using large teacher LMs. While effective in fine-tuning, KD during pre-training …

被引用次数：1 相关文章

[PDF] arxiv.org

Enhancing Knowledge Distillation of Large Language Models through Efficient Multi-Modal Distribution Alignment

T Peng, J Zhang - arXiv preprint arXiv:2409.12545, 2024 - arxiv.org

Knowledge distillation (KD) is an effective model compression method that can transfer the
internal capabilities of large language models (LLMs) to smaller ones. However, the multi …

PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning

G Kim, D Jang, E Yang - arXiv preprint arXiv:2402.12842, 2024 - arxiv.org

Recent advancements in large language models (LLMs) have raised concerns about
inference costs, increasing the need for research into model compression. While knowledge …

被引用次数：7 相关文章所有 2 个版本

[PDF] arxiv.org

Knowledge distillation of large language models

Y Gu, L Dong, F Wei, M Huang - arXiv preprint arXiv:2306.08543, 2023 - arxiv.org

Knowledge Distillation (KD) is a promising technique for reducing the high computational
demand of large language models (LLMs). However, previous KD methods are primarily …

被引用次数：123 相关文章

[PDF] arxiv.org

Dual-Space Knowledge Distillation for Large Language Models

S Zhang, X Zhang, Z Sun, Y Chen, J Xu - arXiv preprint arXiv:2406.17328, 2024 - arxiv.org

Knowledge distillation (KD) is known as a promising solution to compress large language
models (LLMs) via transferring their knowledge to smaller models. During this process, white …

被引用次数：2 相关文章所有 2 个版本

高级搜索

QQ 群

SWITCH: Studying with Teacher for Knowledge Distillation of Large Language Models

Revisiting knowledge distillation for autoregressive language models

Mixkd: Towards efficient distillation of large-scale language models

On-policy distillation of language models: Learning from self-generated mistakes

Distillm: Towards streamlined distillation for large language models

MiniPLM: Knowledge Distillation for Pre-Training Language Models

Enhancing Knowledge Distillation of Large Language Models through Efficient Multi-Modal Distribution Alignment

PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning

Knowledge distillation of large language models

Dual-Space Knowledge Distillation for Large Language Models

引用