相关文章- 学术资源搜索

Survey on knowledge distillation for large language models: methods, evaluation, and application

C Yang, Y Zhu, W Lu, Y Wang, Q Chen, C Gao… - ACM Transactions on …, 2024 - dl.acm.org

Large Language Models (LLMs) have showcased exceptional capabilities in various
domains, attracting significant interest from both academia and industry. Despite their …

被引用次数：6 相关文章所有 4 个版本

[PDF] arxiv.org

A survey on knowledge distillation of large language models

X Xu, M Li, C Tao, T Shen, R Cheng, J Li, C Xu… - arXiv preprint arXiv …, 2024 - arxiv.org

This survey presents an in-depth exploration of knowledge distillation (KD) techniques
within the realm of Large Language Models (LLMs), spotlighting the pivotal role of KD in …

被引用次数：116 相关文章所有 2 个版本

[PDF] arxiv.org

Ddk: Distilling domain knowledge for efficient large language models

J Liu, C Zhang, J Guo, Y Zhang, H Que, K Deng… - arXiv preprint arXiv …, 2024 - arxiv.org

Despite the advanced intelligence abilities of large language models (LLMs) in various
applications, they still face significant computational and storage demands. Knowledge …

被引用次数：4 相关文章所有 4 个版本

[PDF] arxiv.org

Mixkd: Towards efficient distillation of large-scale language models

KJ Liang, W Hao, D Shen, Y Zhou, W Chen… - arXiv preprint arXiv …, 2020 - arxiv.org

Large-scale language models have recently demonstrated impressive empirical
performance. Nevertheless, the improved results are attained at the price of bigger models …

被引用次数：78 相关文章所有 8 个版本

[PDF] arxiv.org

A survey on symbolic knowledge distillation of large language models

K Acharya, A Velasquez… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

This survey article delves into the emerging and critical area of symbolic knowledge
distillation in large language models (LLMs). As LLMs such as generative pretrained …

被引用次数：3 相关文章所有 4 个版本

[PDF] arxiv.org

Revisiting intermediate layer distillation for compressing language models: An overfitting perspective

J Ko, S Park, M Jeong, S Hong, E Ahn… - arXiv preprint arXiv …, 2023 - arxiv.org

Knowledge distillation (KD) is a highly promising method for mitigating the computational
problems of pre-trained language models (PLMs). Among various KD approaches …

被引用次数：6 相关文章所有 4 个版本

[PDF] arxiv.org

Dual-Space Knowledge Distillation for Large Language Models

S Zhang, X Zhang, Z Sun, Y Chen, J Xu - arXiv preprint arXiv:2406.17328, 2024 - arxiv.org

Knowledge distillation (KD) is known as a promising solution to compress large language
models (LLMs) via transferring their knowledge to smaller models. During this process, white …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Rethinking kullback-leibler divergence in knowledge distillation for large language models

T Wu, C Tao, J Wang, R Yang, Z Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org

Kullback-Leiber divergence has been widely used in Knowledge Distillation (KD) to
compress Large Language Models (LLMs). Contrary to prior assertions that reverse …

被引用次数：14 相关文章所有 2 个版本

[PDF] arxiv.org

Knowledge distillation of transformer-based language models revisited

C Lu, J Zhang, Y Chu, Z Chen, J Zhou, F Wu… - arXiv preprint arXiv …, 2022 - arxiv.org

In the past few years, transformer-based pre-trained language models have achieved
astounding success in both industry and academia. However, the large model size and high …

被引用次数：7 相关文章所有 2 个版本

[PDF] arxiv.org

Knowledge Distillation for Closed-Source Language Models

H Chen, X Quan, H Chen, M Yan, J Zhang - arXiv preprint arXiv …, 2024 - arxiv.org

Closed-source language models such as GPT-4 have achieved remarkable performance.
Many recent studies focus on enhancing the capabilities of smaller models through …

被引用次数：4 相关文章所有 3 个版本

高级搜索

QQ 群

Survey on knowledge distillation for large language models: methods, evaluation, and application

A survey on knowledge distillation of large language models

Ddk: Distilling domain knowledge for efficient large language models

Mixkd: Towards efficient distillation of large-scale language models

A survey on symbolic knowledge distillation of large language models

Revisiting intermediate layer distillation for compressing language models: An overfitting perspective

Dual-Space Knowledge Distillation for Large Language Models

Rethinking kullback-leibler divergence in knowledge distillation for large language models

Knowledge distillation of transformer-based language models revisited

Knowledge Distillation for Closed-Source Language Models

引用