This survey presents an in-depth exploration of knowledge distillation (KD) techniques within the realm of Large Language Models (LLMs), spotlighting the pivotal role of KD in …
Despite the advanced intelligence abilities of large language models (LLMs) in various applications, they still face significant computational and storage demands. Knowledge …
Large-scale language models have recently demonstrated impressive empirical performance. Nevertheless, the improved results are attained at the price of bigger models …
This survey article delves into the emerging and critical area of symbolic knowledge distillation in large language models (LLMs). As LLMs such as generative pretrained …
J Ko, S Park, M Jeong, S Hong, E Ahn… - arXiv preprint arXiv …, 2023 - arxiv.org
Knowledge distillation (KD) is a highly promising method for mitigating the computational problems of pre-trained language models (PLMs). Among various KD approaches …
Knowledge distillation (KD) is known as a promising solution to compress large language models (LLMs) via transferring their knowledge to smaller models. During this process, white …
Kullback-Leiber divergence has been widely used in Knowledge Distillation (KD) to compress Large Language Models (LLMs). Contrary to prior assertions that reverse …
C Lu, J Zhang, Y Chu, Z Chen, J Zhou, F Wu… - arXiv preprint arXiv …, 2022 - arxiv.org
In the past few years, transformer-based pre-trained language models have achieved astounding success in both industry and academia. However, the large model size and high …
H Chen, X Quan, H Chen, M Yan, J Zhang - arXiv preprint arXiv …, 2024 - arxiv.org
Closed-source language models such as GPT-4 have achieved remarkable performance. Many recent studies focus on enhancing the capabilities of smaller models through …