相关文章- 学术资源搜索

Ddk: Distilling domain knowledge for efficient large language models

J Liu, C Zhang, J Guo, Y Zhang, H Que, K Deng… - arXiv preprint arXiv …, 2024 - arxiv.org

Despite the advanced intelligence abilities of large language models (LLMs) in various
applications, they still face significant computational and storage demands. Knowledge …

被引用次数：4 相关文章所有 4 个版本

[PDF] arxiv.org

Knowledge distillation of large language models

Y Gu, L Dong, F Wei, M Huang - arXiv preprint arXiv:2306.08543, 2023 - arxiv.org

Knowledge Distillation (KD) is a promising technique for reducing the high computational
demand of large language models (LLMs). However, previous KD methods are primarily …

被引用次数：124 相关文章

[PDF] arxiv.org

MiniPLM: Knowledge Distillation for Pre-Training Language Models

Y Gu, H Zhou, F Meng, J Zhou, M Huang - arXiv preprint arXiv:2410.17215, 2024 - arxiv.org

Knowledge distillation (KD) is widely used to train small, high-performing student language
models (LMs) using large teacher LMs. While effective in fine-tuning, KD during pre-training …

被引用次数：1 相关文章

[PDF] openreview.net

MiniLLM: Knowledge distillation of large language models

Y Gu, L Dong, F Wei, M Huang - The Twelfth International …, 2024 - openreview.net

Knowledge Distillation (KD) is a promising technique for reducing the high computational
demand of large language models (LLMs). However, previous KD methods are primarily …

被引用次数：163 相关文章所有 2 个版本

[PDF] aclanthology.org

Cost-effective distillation of large language models

S Dasgupta, T Cohn, T Baldwin - Findings of the Association for …, 2023 - aclanthology.org

Abstract Knowledge distillation (KD) involves training a small “student” model to replicate the
strong performance of a high-capacity “teacher” model, enabling efficient deployment in …

被引用次数：19 相关文章所有 2 个版本

[PDF] arxiv.org

PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning

G Kim, D Jang, E Yang - arXiv preprint arXiv:2402.12842, 2024 - arxiv.org

Recent advancements in large language models (LLMs) have raised concerns about
inference costs, increasing the need for research into model compression. While knowledge …

被引用次数：7 相关文章所有 2 个版本

[PDF] arxiv.org

Dual-Space Knowledge Distillation for Large Language Models

S Zhang, X Zhang, Z Sun, Y Chen, J Xu - arXiv preprint arXiv:2406.17328, 2024 - arxiv.org

Knowledge distillation (KD) is known as a promising solution to compress large language
models (LLMs) via transferring their knowledge to smaller models. During this process, white …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Knowledge distillation transfer sets and their impact on downstream NLU tasks

C Peris, L Tan, T Gueudre, T Gojayev, P Wei… - arXiv preprint arXiv …, 2022 - arxiv.org

Teacher-student knowledge distillation is a popular technique for compressing today's
prevailing large language models into manageable sizes that fit low-latency downstream …

被引用次数：5 相关文章所有 4 个版本

[PDF] arxiv.org

Llm-neo: Parameter efficient knowledge distillation for large language models

R Yang, T Wu, J Wang, P Hu, N Wong… - arXiv preprint arXiv …, 2024 - arxiv.org

In this paper, we propose a novel LLM-Neo framework that efficiently transfers knowledge
from a large language model (LLM) teacher to a compact student. Initially, we revisit the …

被引用次数：1 相关文章

[PDF] arxiv.org

Exploring and enhancing the transfer of distribution in knowledge distillation for autoregressive language models

J Rao, X Liu, Z Lin, L Ding, J Li, D Tao… - arXiv preprint arXiv …, 2024 - arxiv.org

Knowledge distillation (KD) is a technique that compresses large teacher models by training
smaller student models to mimic them. The success of KD in auto-regressive language …

被引用次数：1 相关文章所有 2 个版本

高级搜索

QQ 群

Ddk: Distilling domain knowledge for efficient large language models

Knowledge distillation of large language models

MiniPLM: Knowledge Distillation for Pre-Training Language Models

MiniLLM: Knowledge distillation of large language models

Cost-effective distillation of large language models

PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning

Dual-Space Knowledge Distillation for Large Language Models

Knowledge distillation transfer sets and their impact on downstream NLU tasks

Llm-neo: Parameter efficient knowledge distillation for large language models

Exploring and enhancing the transfer of distribution in knowledge distillation for autoregressive language models

引用