A survey on knowledge distillation of large language models

X Xu, M Li, C Tao, T Shen, R Cheng, J Li, C Xu… - arXiv preprint arXiv …, 2024 - arxiv.org
This survey presents an in-depth exploration of knowledge distillation (KD) techniques
within the realm of Large Language Models (LLMs), spotlighting the pivotal role of KD in …

Rethinking kullback-leibler divergence in knowledge distillation for large language models

T Wu, C Tao, J Wang, R Yang, Z Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org
Kullback-Leiber divergence has been widely used in Knowledge Distillation (KD) to
compress Large Language Models (LLMs). Contrary to prior assertions that reverse …

Cost-effective distillation of large language models

S Dasgupta, T Cohn, T Baldwin - Findings of the Association for …, 2023 - aclanthology.org
Abstract Knowledge distillation (KD) involves training a small “student” model to replicate the
strong performance of a high-capacity “teacher” model, enabling efficient deployment in …

Mixkd: Towards efficient distillation of large-scale language models

KJ Liang, W Hao, D Shen, Y Zhou, W Chen… - arXiv preprint arXiv …, 2020 - arxiv.org
Large-scale language models have recently demonstrated impressive empirical
performance. Nevertheless, the improved results are attained at the price of bigger models …

Knowledge distillation of large language models

Y Gu, L Dong, F Wei, M Huang - arXiv preprint arXiv:2306.08543, 2023 - arxiv.org
Knowledge Distillation (KD) is a promising technique for reducing the high computational
demand of large language models (LLMs). However, previous KD methods are primarily …

Trends in integration of knowledge and large language models: A survey and taxonomy of methods, benchmarks, and applications

Z Feng, W Ma, W Yu, L Huang, H Wang, Q Chen… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) exhibit superior performance on various natural language
tasks, but they are susceptible to issues stemming from outdated data and domain-specific …

MiniLLM: Knowledge distillation of large language models

Y Gu, L Dong, F Wei, M Huang - The Twelfth International …, 2024 - openreview.net
Knowledge Distillation (KD) is a promising technique for reducing the high computational
demand of large language models (LLMs). However, previous KD methods are primarily …

Dynamic knowledge distillation for pre-trained language models

L Li, Y Lin, S Ren, P Li, J Zhou, X Sun - arXiv preprint arXiv:2109.11295, 2021 - arxiv.org
Knowledge distillation~(KD) has been proved effective for compressing large-scale pre-
trained language models. However, existing methods conduct KD statically, eg, the student …

Improving multi-task deep neural networks via knowledge distillation for natural language understanding

X Liu, P He, W Chen, J Gao - arXiv preprint arXiv:1904.09482, 2019 - arxiv.org
This paper explores the use of knowledge distillation to improve a Multi-Task Deep Neural
Network (MT-DNN)(Liu et al., 2019) for learning text representations across multiple natural …

Knowledge editing for large language models: A survey

S Wang, Y Zhu, H Liu, Z Zheng, C Chen, J Li - ACM Computing Surveys, 2024 - dl.acm.org
Large Language Models (LLMs) have recently transformed both the academic and industrial
landscapes due to their remarkable capacity to understand, analyze, and generate texts …