相关文章- 学术资源搜索

Contrastive distillation on intermediate representations for language model compression

S Sun, Z Gan, Y Cheng, Y Fang, S Wang… - arXiv preprint arXiv …, 2020 - arxiv.org

Existing language model compression methods mostly use a simple L2 loss to distill
knowledge in the intermediate representations of a large BERT model to a smaller one …

被引用次数：79 相关文章所有 3 个版本

[PDF] mlr.press

Less is more: Task-aware layer-wise distillation for language model compression

C Liang, S Zuo, Q Zhang, P He… - … on Machine Learning, 2023 - proceedings.mlr.press

Layer-wise distillation is a powerful tool to compress large models (ie teacher models) into
small ones (ie, student models). The student distills knowledge from the teacher by …

被引用次数：74 相关文章所有 7 个版本

[PDF] arxiv.org

Meta-KD: A meta knowledge distillation framework for language model compression across domains

H Pan, C Wang, M Qiu, Y Zhang, Y Li… - arXiv preprint arXiv …, 2020 - arxiv.org

Pre-trained language models have been applied to various NLP tasks with considerable
performance gains. However, the large model sizes, together with the long inference time …

被引用次数：48 相关文章所有 8 个版本

[PDF] arxiv.org

Compression of generative pre-trained language models via quantization

C Tao, L Hou, W Zhang, L Shang, X Jiang, Q Liu… - arXiv preprint arXiv …, 2022 - arxiv.org

The increasing size of generative Pre-trained Language Models (PLMs) has greatly
increased the demand for model compression. Despite various methods to compress BERT …

被引用次数：93 相关文章所有 6 个版本

[PDF] arxiv.org

Language model compression with weighted low-rank factorization

YC Hsu, T Hua, S Chang, Q Lou, Y Shen… - arXiv preprint arXiv …, 2022 - arxiv.org

Factorizing a large matrix into small matrices is a popular strategy for model compression.
Singular value decomposition (SVD) plays a vital role in this compression strategy …

被引用次数：87 相关文章所有 3 个版本

[PDF] arxiv.org

Kroneckerbert: Learning kronecker decomposition for pre-trained language models via knowledge distillation

MS Tahaei, E Charlaix, VP Nia, A Ghodsi… - arXiv preprint arXiv …, 2021 - arxiv.org

The development of over-parameterized pre-trained language models has made a
significant contribution toward the success of natural language processing. While over …

被引用次数：24 相关文章所有 2 个版本

[PDF] openreview.net

Extreme language model compression with optimal subwords and shared projections

S Zhao, R Gupta, Y Song, D Zhou - 2019 - openreview.net

Pre-trained deep neural network language models such as ELMo, GPT, BERT and XLNet
have recently achieved state-of-the-art performance on a variety of language understanding …

被引用次数：63 相关文章

[PDF] arxiv.org

Asvd: Activation-aware singular value decomposition for compressing large language models

Z Yuan, Y Shang, Y Song, Q Wu, Y Yan… - arXiv preprint arXiv …, 2023 - arxiv.org

This paper explores a new post-hoc training-free compression paradigm for compressing
Large Language Models (LLMs) to facilitate their wider adoption in various computing …

被引用次数：35 相关文章所有 2 个版本

[PDF] arxiv.org

Feature Alignment-Based Knowledge Distillation for Efficient Compression of Large Language Models

S Wang, C Wang, J Gao, Z Qi, H Zheng… - arXiv preprint arXiv …, 2024 - arxiv.org

This study proposes a knowledge distillation algorithm based on large language models
and feature alignment, aiming to effectively transfer the knowledge of large pre-trained …

被引用次数：5 相关文章

[PDF] arxiv.org

Binarybert: Pushing the limit of bert quantization

H Bai, W Zhang, L Hou, L Shang, J Jin, X Jiang… - arXiv preprint arXiv …, 2020 - arxiv.org

The rapid development of large pre-trained language models has greatly increased the
demand for model compression techniques, among which quantization is a popular solution …

被引用次数：242 相关文章所有 4 个版本

高级搜索

QQ 群

Contrastive distillation on intermediate representations for language model compression

Less is more: Task-aware layer-wise distillation for language model compression

Meta-KD: A meta knowledge distillation framework for language model compression across domains

Compression of generative pre-trained language models via quantization

Language model compression with weighted low-rank factorization

Kroneckerbert: Learning kronecker decomposition for pre-trained language models via knowledge distillation

Extreme language model compression with optimal subwords and shared projections

Asvd: Activation-aware singular value decomposition for compressing large language models

Feature Alignment-Based Knowledge Distillation for Efficient Compression of Large Language Models

Binarybert: Pushing the limit of bert quantization

相关搜索

引用