LRC-BERT: latent-representation contrastive knowledge distillation for natural language understanding

H Fu, S Zhou, Q Yang, J Tang, G Liu, K Liu… - Proceedings of the AAAI …, 2021 - ojs.aaai.org
The pre-training models such as BERT have achieved great results in various natural
language processing problems. However, a large number of parameters need significant …

Adversarial data augmentation for task-specific knowledge distillation of pre-trained transformers

M Zhang, NU Naresh, Y He - Proceedings of the AAAI Conference on …, 2022 - ojs.aaai.org
Deep and large pre-trained language models (eg, BERT, GPT-3) are state-of-the-art for
various natural language processing tasks. However, the huge size of these models brings …

Knowledge prompting in pre-trained language model for natural language understanding

J Wang, W Huang, Q Shi, H Wang, M Qiu, X Li… - arXiv preprint arXiv …, 2022 - arxiv.org
Knowledge-enhanced Pre-trained Language Model (PLM) has recently received significant
attention, which aims to incorporate factual knowledge into PLMs. However, most existing …

Greedy-layer pruning: Speeding up transformer models for natural language processing

D Peer, S Stabinger, S Engl… - Pattern recognition …, 2022 - Elsevier
Fine-tuning transformer models after unsupervised pre-training reaches a very high
performance on many different natural language processing tasks. Unfortunately …

Universal-KD: Attention-based output-grounded intermediate layer knowledge distillation

Y Wu, M Rezagholizadeh, A Ghaddar… - Proceedings of the …, 2021 - aclanthology.org
Intermediate layer matching is shown as an effective approach for improving knowledge
distillation (KD). However, this technique applies matching in the hidden spaces of two …

Natural language generation for effective knowledge distillation

R Tang, Y Lu, J Lin - Proceedings of the 2nd Workshop on Deep …, 2019 - aclanthology.org
Abstract Knowledge distillation can effectively transfer knowledge from BERT, a deep
language representation model, to traditional, shallow word embedding-based neural …

Ernie: Enhanced representation through knowledge integration

Y Sun, S Wang, Y Li, S Feng, X Chen, H Zhang… - arXiv preprint arXiv …, 2019 - arxiv.org
We present a novel language representation model enhanced by knowledge called ERNIE
(Enhanced Representation through kNowledge IntEgration). Inspired by the masking …

Textbrewer: An open-source knowledge distillation toolkit for natural language processing

Z Yang, Y Cui, Z Chen, W Che, T Liu, S Wang… - arXiv preprint arXiv …, 2020 - arxiv.org
In this paper, we introduce TextBrewer, an open-source knowledge distillation toolkit
designed for natural language processing. It works with different neural network models and …

Improving multi-task deep neural networks via knowledge distillation for natural language understanding

X Liu, P He, W Chen, J Gao - arXiv preprint arXiv:1904.09482, 2019 - arxiv.org
This paper explores the use of knowledge distillation to improve a Multi-Task Deep Neural
Network (MT-DNN)(Liu et al., 2019) for learning text representations across multiple natural …

Adaptive contrastive knowledge distillation for BERT compression

J Guo, J Liu, Z Wang, Y Ma, R Gong… - Findings of the …, 2023 - aclanthology.org
In this paper, we propose a new knowledge distillation approach called adaptive contrastive
knowledge distillation (ACKD) for BERT compression. Different from existing knowledge …