相关文章- 学术资源搜索

Mixkd: Towards efficient distillation of large-scale language models

KJ Liang, W Hao, D Shen, Y Zhou, W Chen… - arXiv preprint arXiv …, 2020 - arxiv.org

Large-scale language models have recently demonstrated impressive empirical
performance. Nevertheless, the improved results are attained at the price of bigger models …

被引用次数：78 相关文章所有 8 个版本

[PDF] arxiv.org

Dynamic knowledge distillation for pre-trained language models

L Li, Y Lin, S Ren, P Li, J Zhou, X Sun - arXiv preprint arXiv:2109.11295, 2021 - arxiv.org

Knowledge distillation~(KD) has been proved effective for compressing large-scale pre-
trained language models. However, existing methods conduct KD statically, eg, the student …

被引用次数：45 相关文章所有 4 个版本

[PDF] aclanthology.org

Cost-effective distillation of large language models

S Dasgupta, T Cohn, T Baldwin - Findings of the Association for …, 2023 - aclanthology.org

Abstract Knowledge distillation (KD) involves training a small “student” model to replicate the
strong performance of a high-capacity “teacher” model, enabling efficient deployment in …

被引用次数：19 相关文章所有 2 个版本

[PDF] arxiv.org

Distillm: Towards streamlined distillation for large language models

J Ko, S Kim, T Chen, SY Yun - arXiv preprint arXiv:2402.03898, 2024 - arxiv.org

Knowledge distillation (KD) is widely used for compressing a teacher model to a smaller
student model, reducing its inference cost and memory footprint while preserving model …

被引用次数：26 相关文章所有 3 个版本

[PDF] acm.org

Survey on knowledge distillation for large language models: methods, evaluation, and application

C Yang, Y Zhu, W Lu, Y Wang, Q Chen, C Gao… - ACM Transactions on …, 2024 - dl.acm.org

Large Language Models (LLMs) have showcased exceptional capabilities in various
domains, attracting significant interest from both academia and industry. Despite their …

被引用次数：6 相关文章所有 4 个版本

[PDF] arxiv.org

Ddk: Distilling domain knowledge for efficient large language models

J Liu, C Zhang, J Guo, Y Zhang, H Que, K Deng… - arXiv preprint arXiv …, 2024 - arxiv.org

Despite the advanced intelligence abilities of large language models (LLMs) in various
applications, they still face significant computational and storage demands. Knowledge …

被引用次数：4 相关文章所有 4 个版本

[PDF] arxiv.org

Knowledge distillation of large language models

Y Gu, L Dong, F Wei, M Huang - arXiv preprint arXiv:2306.08543, 2023 - arxiv.org

Knowledge Distillation (KD) is a promising technique for reducing the high computational
demand of large language models (LLMs). However, previous KD methods are primarily …

被引用次数：124 相关文章

[PDF] arxiv.org

Meta-KD: A meta knowledge distillation framework for language model compression across domains

H Pan, C Wang, M Qiu, Y Zhang, Y Li… - arXiv preprint arXiv …, 2020 - arxiv.org

Pre-trained language models have been applied to various NLP tasks with considerable
performance gains. However, the large model sizes, together with the long inference time …

被引用次数：48 相关文章所有 8 个版本

[PDF] arxiv.org

Homodistil: Homotopic task-agnostic distillation of pre-trained transformers

C Liang, H Jiang, Z Li, X Tang, B Yin, T Zhao - arXiv preprint arXiv …, 2023 - arxiv.org

Knowledge distillation has been shown to be a powerful model compression approach to
facilitate the deployment of pre-trained language models in practice. This paper focuses on …

被引用次数：31 相关文章所有 4 个版本

[PDF] arxiv.org

Gkd: Generalized knowledge distillation for auto-regressive sequence models

R Agarwal, N Vieillard, P Stanczyk, S Ramos… - arXiv preprint arXiv …, 2023 - arxiv.org

Knowledge distillation is commonly used for compressing neural networks to reduce their
inference cost and memory footprint. However, current distillation methods for auto …

被引用次数：64 相关文章

高级搜索

QQ 群

Mixkd: Towards efficient distillation of large-scale language models

Dynamic knowledge distillation for pre-trained language models

Cost-effective distillation of large language models

Distillm: Towards streamlined distillation for large language models

Survey on knowledge distillation for large language models: methods, evaluation, and application

Ddk: Distilling domain knowledge for efficient large language models

Knowledge distillation of large language models

Meta-KD: A meta knowledge distillation framework for language model compression across domains

Homodistil: Homotopic task-agnostic distillation of pre-trained transformers

Gkd: Generalized knowledge distillation for auto-regressive sequence models

引用