相关文章- 学术资源搜索

Natural language generation for effective knowledge distillation

R Tang, Y Lu, J Lin - Proceedings of the 2nd Workshop on Deep …, 2019 - aclanthology.org

Abstract Knowledge distillation can effectively transfer knowledge from BERT, a deep
language representation model, to traditional, shallow word embedding-based neural …

被引用次数：35 相关文章所有 4 个版本

[PDF] aaai.org

Adversarial data augmentation for task-specific knowledge distillation of pre-trained transformers

M Zhang, NU Naresh, Y He - Proceedings of the AAAI Conference on …, 2022 - ojs.aaai.org

Deep and large pre-trained language models (eg, BERT, GPT-3) are state-of-the-art for
various natural language processing tasks. However, the huge size of these models brings …

被引用次数：14 相关文章所有 6 个版本

[PDF] aaai.org

LRC-BERT: latent-representation contrastive knowledge distillation for natural language understanding

H Fu, S Zhou, Q Yang, J Tang, G Liu, K Liu… - Proceedings of the AAAI …, 2021 - ojs.aaai.org

The pre-training models such as BERT have achieved great results in various natural
language processing problems. However, a large number of parameters need significant …

被引用次数：58 相关文章所有 7 个版本

[PDF] arxiv.org

Textbrewer: An open-source knowledge distillation toolkit for natural language processing

Z Yang, Y Cui, Z Chen, W Che, T Liu, S Wang… - arXiv preprint arXiv …, 2020 - arxiv.org

In this paper, we introduce TextBrewer, an open-source knowledge distillation toolkit
designed for natural language processing. It works with different neural network models and …

被引用次数：49 相关文章所有 6 个版本

[PDF] arxiv.org

Xtremedistiltransformers: Task transfer for task-agnostic distillation

S Mukherjee, AH Awadallah, J Gao - arXiv preprint arXiv:2106.04563, 2021 - arxiv.org

While deep and large pre-trained models are the state-of-the-art for various natural
language processing tasks, their huge size poses significant challenges for practical uses in …

被引用次数：22 相关文章所有 2 个版本

[PDF] arxiv.org

Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection

C Wang, Y Lu, Y Mu, Y Hu, T Xiao, J Zhu - arXiv preprint arXiv:2302.00444, 2023 - arxiv.org

Knowledge distillation addresses the problem of transferring knowledge from a teacher
model to a student model. In this process, we typically have multiple types of knowledge …

被引用次数：4 相关文章所有 5 个版本

[PDF] arxiv.org

Prompting to distill: Boosting data-free knowledge distillation via reinforced prompt

X Ma, X Wang, G Fang, Y Shen, W Lu - arXiv preprint arXiv:2205.07523, 2022 - arxiv.org

Data-free knowledge distillation (DFKD) conducts knowledge distillation via eliminating the
dependence of original training data, and has recently achieved impressive results in …

被引用次数：7 相关文章所有 5 个版本

[PDF] arxiv.org

Self-knowledge distillation in natural language processing

S Hahn, H Choi - arXiv preprint arXiv:1908.01851, 2019 - arxiv.org

Since deep learning became a key player in natural language processing (NLP), many deep
learning models have been showing remarkable performances in a variety of NLP tasks …

被引用次数：125 相关文章所有 4 个版本

[PDF] aclanthology.org

RW-KD: Sample-wise loss terms re-weighting for knowledge distillation

P Lu, A Ghaddar, A Rashid… - Findings of the …, 2021 - aclanthology.org

Abstract Knowledge Distillation (KD) is extensively used in Natural Language Processing to
compress the pre-training and task-specific fine-tuning phases of large neural language …

被引用次数：11 相关文章所有 3 个版本

[PDF] arxiv.org

Rethinking kullback-leibler divergence in knowledge distillation for large language models

T Wu, C Tao, J Wang, R Yang, Z Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org

Kullback-Leiber divergence has been widely used in Knowledge Distillation (KD) to
compress Large Language Models (LLMs). Contrary to prior assertions that reverse …

被引用次数：14 相关文章所有 2 个版本

高级搜索

QQ 群

Natural language generation for effective knowledge distillation

Adversarial data augmentation for task-specific knowledge distillation of pre-trained transformers

LRC-BERT: latent-representation contrastive knowledge distillation for natural language understanding

Textbrewer: An open-source knowledge distillation toolkit for natural language processing

Xtremedistiltransformers: Task transfer for task-agnostic distillation

Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection

Prompting to distill: Boosting data-free knowledge distillation via reinforced prompt

Self-knowledge distillation in natural language processing

RW-KD: Sample-wise loss terms re-weighting for knowledge distillation

Rethinking kullback-leibler divergence in knowledge distillation for large language models

相关搜索

引用