相关文章- 学术资源搜索

LRC-BERT: latent-representation contrastive knowledge distillation for natural language understanding

H Fu, S Zhou, Q Yang, J Tang, G Liu, K Liu… - Proceedings of the AAAI …, 2021 - ojs.aaai.org

The pre-training models such as BERT have achieved great results in various natural
language processing problems. However, a large number of parameters need significant …

被引用次数：58 相关文章所有 7 个版本

[PDF] aaai.org

Adversarial data augmentation for task-specific knowledge distillation of pre-trained transformers

M Zhang, NU Naresh, Y He - Proceedings of the AAAI Conference on …, 2022 - ojs.aaai.org

Deep and large pre-trained language models (eg, BERT, GPT-3) are state-of-the-art for
various natural language processing tasks. However, the huge size of these models brings …

被引用次数：14 相关文章所有 6 个版本

[PDF] arxiv.org

Knowledge prompting in pre-trained language model for natural language understanding

J Wang, W Huang, Q Shi, H Wang, M Qiu, X Li… - arXiv preprint arXiv …, 2022 - arxiv.org

Knowledge-enhanced Pre-trained Language Model (PLM) has recently received significant
attention, which aims to incorporate factual knowledge into PLMs. However, most existing …

被引用次数：23 相关文章所有 3 个版本

[PDF] arxiv.org

Greedy-layer pruning: Speeding up transformer models for natural language processing

D Peer, S Stabinger, S Engl… - Pattern recognition …, 2022 - Elsevier

Fine-tuning transformer models after unsupervised pre-training reaches a very high
performance on many different natural language processing tasks. Unfortunately …

被引用次数：38 相关文章所有 4 个版本

[PDF] aclanthology.org

Universal-KD: Attention-based output-grounded intermediate layer knowledge distillation

Y Wu, M Rezagholizadeh, A Ghaddar… - Proceedings of the …, 2021 - aclanthology.org

Intermediate layer matching is shown as an effective approach for improving knowledge
distillation (KD). However, this technique applies matching in the hidden spaces of two …

被引用次数：28 相关文章所有 2 个版本

[PDF] aclanthology.org

Natural language generation for effective knowledge distillation

R Tang, Y Lu, J Lin - Proceedings of the 2nd Workshop on Deep …, 2019 - aclanthology.org

Abstract Knowledge distillation can effectively transfer knowledge from BERT, a deep
language representation model, to traditional, shallow word embedding-based neural …

被引用次数：35 相关文章所有 4 个版本

[PDF] arxiv.org

Ernie: Enhanced representation through knowledge integration

Y Sun, S Wang, Y Li, S Feng, X Chen, H Zhang… - arXiv preprint arXiv …, 2019 - arxiv.org

We present a novel language representation model enhanced by knowledge called ERNIE
(Enhanced Representation through kNowledge IntEgration). Inspired by the masking …

被引用次数：1221 相关文章所有 5 个版本

[PDF] arxiv.org

Textbrewer: An open-source knowledge distillation toolkit for natural language processing

Z Yang, Y Cui, Z Chen, W Che, T Liu, S Wang… - arXiv preprint arXiv …, 2020 - arxiv.org

In this paper, we introduce TextBrewer, an open-source knowledge distillation toolkit
designed for natural language processing. It works with different neural network models and …

被引用次数：49 相关文章所有 6 个版本

[PDF] arxiv.org

Improving multi-task deep neural networks via knowledge distillation for natural language understanding

X Liu, P He, W Chen, J Gao - arXiv preprint arXiv:1904.09482, 2019 - arxiv.org

This paper explores the use of knowledge distillation to improve a Multi-Task Deep Neural
Network (MT-DNN)(Liu et al., 2019) for learning text representations across multiple natural …

被引用次数：226 相关文章所有 4 个版本

[PDF] aclanthology.org

Adaptive contrastive knowledge distillation for BERT compression

J Guo, J Liu, Z Wang, Y Ma, R Gong… - Findings of the …, 2023 - aclanthology.org

In this paper, we propose a new knowledge distillation approach called adaptive contrastive
knowledge distillation (ACKD) for BERT compression. Different from existing knowledge …

被引用次数：8 相关文章所有 2 个版本

高级搜索

QQ 群

LRC-BERT: latent-representation contrastive knowledge distillation for natural language understanding

Adversarial data augmentation for task-specific knowledge distillation of pre-trained transformers

Knowledge prompting in pre-trained language model for natural language understanding

Greedy-layer pruning: Speeding up transformer models for natural language processing

Universal-KD: Attention-based output-grounded intermediate layer knowledge distillation

Natural language generation for effective knowledge distillation

Ernie: Enhanced representation through knowledge integration

Textbrewer: An open-source knowledge distillation toolkit for natural language processing

Improving multi-task deep neural networks via knowledge distillation for natural language understanding

Adaptive contrastive knowledge distillation for BERT compression

相关搜索

引用