相关文章- 学术资源搜索

A short study on compressing decoder-based language models

T Li, YE Mesbahi, I Kobyzev, A Rashid… - arXiv preprint arXiv …, 2021 - arxiv.org

Pre-trained Language Models (PLMs) have been successful for a wide range of natural
language processing (NLP) tasks. The state-of-the-art of PLMs, however, are extremely …

被引用次数：24 相关文章所有 4 个版本

[PDF] arxiv.org

Kroneckerbert: Learning kronecker decomposition for pre-trained language models via knowledge distillation

MS Tahaei, E Charlaix, VP Nia, A Ghodsi… - arXiv preprint arXiv …, 2021 - arxiv.org

The development of over-parameterized pre-trained language models has made a
significant contribution toward the success of natural language processing. While over …

被引用次数：24 相关文章所有 2 个版本

[PDF] arxiv.org

Robustness challenges in model distillation and pruning for natural language understanding

M Du, S Mukherjee, Y Cheng, M Shokouhi… - arXiv preprint arXiv …, 2021 - arxiv.org

Recent work has focused on compressing pre-trained language models (PLMs) like BERT
where the major focus has been to improve the in-distribution performance for downstream …

被引用次数：18 相关文章所有 4 个版本

[PDF] aclanthology.org

KroneckerBERT: Significant compression of pre-trained language models through kronecker decomposition and knowledge distillation

M Tahaei, E Charlaix, V Nia, A Ghodsi… - Proceedings of the …, 2022 - aclanthology.org

The development of over-parameterized pre-trained language models has made a
significant contribution toward the success of natural language processing. While over …

被引用次数：16 相关文章所有 2 个版本

[PDF] aclanthology.org

Compressing pre-trained language models by matrix decomposition

MB Noach, Y Goldberg - Proceedings of the 1st Conference of the …, 2020 - aclanthology.org

Large pre-trained language models reach state-of-the-art results on many different NLP
tasks when fine-tuned individually; They also come with a significant memory and …

被引用次数：73 相关文章所有 2 个版本

[PDF] arxiv.org

Feature Alignment-Based Knowledge Distillation for Efficient Compression of Large Language Models

S Wang, C Wang, J Gao, Z Qi, H Zheng… - arXiv preprint arXiv …, 2024 - arxiv.org

This study proposes a knowledge distillation algorithm based on large language models
and feature alignment, aiming to effectively transfer the knowledge of large pre-trained …

被引用次数：5 相关文章

[PDF] arxiv.org

Meta-KD: A meta knowledge distillation framework for language model compression across domains

H Pan, C Wang, M Qiu, Y Zhang, Y Li… - arXiv preprint arXiv …, 2020 - arxiv.org

Pre-trained language models have been applied to various NLP tasks with considerable
performance gains. However, the large model sizes, together with the long inference time …

被引用次数：48 相关文章所有 8 个版本

[PDF] arxiv.org

Kronecker decomposition for gpt compression

A Edalati, M Tahaei, A Rashid, VP Nia, JJ Clark… - arXiv preprint arXiv …, 2021 - arxiv.org

GPT is an auto-regressive Transformer-based pre-trained language model which has
attracted a lot of attention in the natural language processing (NLP) domain due to its state …

被引用次数：37 相关文章所有 6 个版本

[PDF] arxiv.org

Revisiting intermediate layer distillation for compressing language models: An overfitting perspective

J Ko, S Park, M Jeong, S Hong, E Ahn… - arXiv preprint arXiv …, 2023 - arxiv.org

Knowledge distillation (KD) is a highly promising method for mitigating the computational
problems of pre-trained language models (PLMs). Among various KD approaches …

被引用次数：6 相关文章所有 4 个版本

[PDF] arxiv.org

Moebert: from bert to mixture-of-experts via importance-guided adaptation

S Zuo, Q Zhang, C Liang, P He, T Zhao… - arXiv preprint arXiv …, 2022 - arxiv.org

Pre-trained language models have demonstrated superior performance in various natural
language processing tasks. However, these models usually contain hundreds of millions of …

被引用次数：52 相关文章所有 5 个版本

高级搜索

QQ 群

A short study on compressing decoder-based language models

Kroneckerbert: Learning kronecker decomposition for pre-trained language models via knowledge distillation

Robustness challenges in model distillation and pruning for natural language understanding

KroneckerBERT: Significant compression of pre-trained language models through kronecker decomposition and knowledge distillation

Compressing pre-trained language models by matrix decomposition

Feature Alignment-Based Knowledge Distillation for Efficient Compression of Large Language Models

Meta-KD: A meta knowledge distillation framework for language model compression across domains

Kronecker decomposition for gpt compression

Revisiting intermediate layer distillation for compressing language models: An overfitting perspective

Moebert: from bert to mixture-of-experts via importance-guided adaptation

引用