相关文章- 学术资源搜索

A survey on knowledge distillation of large language models

X Xu, M Li, C Tao, T Shen, R Cheng, J Li, C Xu… - arXiv preprint arXiv …, 2024 - arxiv.org

This survey presents an in-depth exploration of knowledge distillation (KD) techniques
within the realm of Large Language Models (LLMs), spotlighting the pivotal role of KD in …

被引用次数：116 相关文章所有 2 个版本

[PDF] arxiv.org

Rethinking kullback-leibler divergence in knowledge distillation for large language models

T Wu, C Tao, J Wang, R Yang, Z Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org

Kullback-Leiber divergence has been widely used in Knowledge Distillation (KD) to
compress Large Language Models (LLMs). Contrary to prior assertions that reverse …

被引用次数：14 相关文章所有 2 个版本

[PDF] aclanthology.org

Cost-effective distillation of large language models

S Dasgupta, T Cohn, T Baldwin - Findings of the Association for …, 2023 - aclanthology.org

Abstract Knowledge distillation (KD) involves training a small “student” model to replicate the
strong performance of a high-capacity “teacher” model, enabling efficient deployment in …

被引用次数：19 相关文章所有 2 个版本

[PDF] arxiv.org

Mixkd: Towards efficient distillation of large-scale language models

KJ Liang, W Hao, D Shen, Y Zhou, W Chen… - arXiv preprint arXiv …, 2020 - arxiv.org

Large-scale language models have recently demonstrated impressive empirical
performance. Nevertheless, the improved results are attained at the price of bigger models …

被引用次数：78 相关文章所有 8 个版本

[PDF] arxiv.org

Knowledge distillation of large language models

Y Gu, L Dong, F Wei, M Huang - arXiv preprint arXiv:2306.08543, 2023 - arxiv.org

Knowledge Distillation (KD) is a promising technique for reducing the high computational
demand of large language models (LLMs). However, previous KD methods are primarily …

被引用次数：124 相关文章

[PDF] arxiv.org

Trends in integration of knowledge and large language models: A survey and taxonomy of methods, benchmarks, and applications

Z Feng, W Ma, W Yu, L Huang, H Wang, Q Chen… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models (LLMs) exhibit superior performance on various natural language
tasks, but they are susceptible to issues stemming from outdated data and domain-specific …

被引用次数：36 相关文章所有 2 个版本

[PDF] openreview.net

MiniLLM: Knowledge distillation of large language models

Y Gu, L Dong, F Wei, M Huang - The Twelfth International …, 2024 - openreview.net

Knowledge Distillation (KD) is a promising technique for reducing the high computational
demand of large language models (LLMs). However, previous KD methods are primarily …

被引用次数：163 相关文章所有 2 个版本

[PDF] arxiv.org

Dynamic knowledge distillation for pre-trained language models

L Li, Y Lin, S Ren, P Li, J Zhou, X Sun - arXiv preprint arXiv:2109.11295, 2021 - arxiv.org

Knowledge distillation~(KD) has been proved effective for compressing large-scale pre-
trained language models. However, existing methods conduct KD statically, eg, the student …

被引用次数：45 相关文章所有 4 个版本

[PDF] arxiv.org

Improving multi-task deep neural networks via knowledge distillation for natural language understanding

X Liu, P He, W Chen, J Gao - arXiv preprint arXiv:1904.09482, 2019 - arxiv.org

This paper explores the use of knowledge distillation to improve a Multi-Task Deep Neural
Network (MT-DNN)(Liu et al., 2019) for learning text representations across multiple natural …

被引用次数：226 相关文章所有 4 个版本

[PDF] acm.org

Knowledge editing for large language models: A survey

S Wang, Y Zhu, H Liu, Z Zheng, C Chen, J Li - ACM Computing Surveys, 2024 - dl.acm.org

Large Language Models (LLMs) have recently transformed both the academic and industrial
landscapes due to their remarkable capacity to understand, analyze, and generate texts …

被引用次数：103 相关文章所有 2 个版本

高级搜索

QQ 群

A survey on knowledge distillation of large language models

Rethinking kullback-leibler divergence in knowledge distillation for large language models

Cost-effective distillation of large language models

Mixkd: Towards efficient distillation of large-scale language models

Knowledge distillation of large language models

Trends in integration of knowledge and large language models: A survey and taxonomy of methods, benchmarks, and applications

MiniLLM: Knowledge distillation of large language models

Dynamic knowledge distillation for pre-trained language models

Improving multi-task deep neural networks via knowledge distillation for natural language understanding

Knowledge editing for large language models: A survey

引用