相关文章- 学术资源搜索

Llm-neo: Parameter efficient knowledge distillation for large language models

R Yang, T Wu, J Wang, P Hu, N Wong… - arXiv preprint arXiv …, 2024 - arxiv.org

In this paper, we propose a novel LLM-Neo framework that efficiently transfers knowledge
from a large language model (LLM) teacher to a compact student. Initially, we revisit the …

被引用次数：1 相关文章

[PDF] arxiv.org

Ddk: Distilling domain knowledge for efficient large language models

J Liu, C Zhang, J Guo, Y Zhang, H Que, K Deng… - arXiv preprint arXiv …, 2024 - arxiv.org

Despite the advanced intelligence abilities of large language models (LLMs) in various
applications, they still face significant computational and storage demands. Knowledge …

被引用次数：4 相关文章所有 4 个版本

[PDF] arxiv.org

Enhancing Knowledge Distillation for LLMs with Response-Priming Prompting

V Goyal, M Khan, A Tirupati, H Saini, M Lam… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) have demonstrated remarkable performance across a wide
range of natural language processing (NLP) tasks. However, these models are often difficult …

[PDF] arxiv.org

Knowledge distillation of large language models

Y Gu, L Dong, F Wei, M Huang - arXiv preprint arXiv:2306.08543, 2023 - arxiv.org

Knowledge Distillation (KD) is a promising technique for reducing the high computational
demand of large language models (LLMs). However, previous KD methods are primarily …

被引用次数：123 相关文章

[PDF] arxiv.org

Dual-Space Knowledge Distillation for Large Language Models

S Zhang, X Zhang, Z Sun, Y Chen, J Xu - arXiv preprint arXiv:2406.17328, 2024 - arxiv.org

Knowledge distillation (KD) is known as a promising solution to compress large language
models (LLMs) via transferring their knowledge to smaller models. During this process, white …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning

G Kim, D Jang, E Yang - arXiv preprint arXiv:2402.12842, 2024 - arxiv.org

Recent advancements in large language models (LLMs) have raised concerns about
inference costs, increasing the need for research into model compression. While knowledge …

被引用次数：7 相关文章所有 2 个版本

[PDF] arxiv.org

Pre-training Distillation for Large Language Models: A Design Space Exploration

H Peng, X Lv, Y Bai, Z Yao, J Zhang, L Hou… - arXiv preprint arXiv …, 2024 - arxiv.org

Knowledge distillation (KD) aims to transfer knowledge from a large teacher model to a
smaller student model. Previous work applying KD in the field of large language models …

[PDF] openreview.net

MiniLLM: Knowledge distillation of large language models

Y Gu, L Dong, F Wei, M Huang - The Twelfth International …, 2024 - openreview.net

Knowledge Distillation (KD) is a promising technique for reducing the high computational
demand of large language models (LLMs). However, previous KD methods are primarily …

被引用次数：163 相关文章所有 2 个版本

[PDF] arxiv.org

MiniPLM: Knowledge Distillation for Pre-Training Language Models

Y Gu, H Zhou, F Meng, J Zhou, M Huang - arXiv preprint arXiv:2410.17215, 2024 - arxiv.org

Knowledge distillation (KD) is widely used to train small, high-performing student language
models (LMs) using large teacher LMs. While effective in fine-tuning, KD during pre-training …

被引用次数：1 相关文章

Sparse Mixture of Experts Language Models Excel in Knowledge Distillation

H Xu, H Liu, W Gong, X Deng, H Wang - CCF International Conference on …, 2024 - Springer

Abstract Knowledge distillation is an effective method for reducing the computational
overhead of large language models. However, recent optimization efforts in distilling large …

高级搜索

QQ 群

Llm-neo: Parameter efficient knowledge distillation for large language models

Ddk: Distilling domain knowledge for efficient large language models

Enhancing Knowledge Distillation for LLMs with Response-Priming Prompting

Knowledge distillation of large language models

Dual-Space Knowledge Distillation for Large Language Models

PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning

Pre-training Distillation for Large Language Models: A Design Space Exploration

MiniLLM: Knowledge distillation of large language models

MiniPLM: Knowledge Distillation for Pre-Training Language Models

Sparse Mixture of Experts Language Models Excel in Knowledge Distillation

引用