相关文章- 学术资源搜索

Self-knowledge distillation in natural language processing

S Hahn, H Choi - arXiv preprint arXiv:1908.01851, 2019 - arxiv.org

Since deep learning became a key player in natural language processing (NLP), many deep
learning models have been showing remarkable performances in a variety of NLP tasks …

被引用次数：125 相关文章所有 4 个版本

[PDF] aaai.org

LRC-BERT: latent-representation contrastive knowledge distillation for natural language understanding

H Fu, S Zhou, Q Yang, J Tang, G Liu, K Liu… - Proceedings of the AAAI …, 2021 - ojs.aaai.org

The pre-training models such as BERT have achieved great results in various natural
language processing problems. However, a large number of parameters need significant …

被引用次数：58 相关文章所有 7 个版本

[PDF] arxiv.org

Textbrewer: An open-source knowledge distillation toolkit for natural language processing

Z Yang, Y Cui, Z Chen, W Che, T Liu, S Wang… - arXiv preprint arXiv …, 2020 - arxiv.org

In this paper, we introduce TextBrewer, an open-source knowledge distillation toolkit
designed for natural language processing. It works with different neural network models and …

被引用次数：49 相关文章所有 6 个版本

[PDF] arxiv.org

Improving multi-task deep neural networks via knowledge distillation for natural language understanding

X Liu, P He, W Chen, J Gao - arXiv preprint arXiv:1904.09482, 2019 - arxiv.org

This paper explores the use of knowledge distillation to improve a Multi-Task Deep Neural
Network (MT-DNN)(Liu et al., 2019) for learning text representations across multiple natural …

被引用次数：226 相关文章所有 4 个版本

[PDF] arxiv.org

A survey of knowledge enhanced pre-trained models

J Yang, X Hu, G Xiao, Y Shen - arXiv preprint arXiv:2110.00269, 2021 - arxiv.org

Pre-trained language models learn informative word representations on a large-scale text
corpus through self-supervised learning, which has achieved promising performance in …

被引用次数：65 相关文章所有 2 个版本

[PDF] aaai.org

Reinforced multi-teacher selection for knowledge distillation

F Yuan, L Shou, J Pei, W Lin, M Gong, Y Fu… - Proceedings of the AAAI …, 2021 - ojs.aaai.org

In natural language processing (NLP) tasks, slow inference speed and huge footprints in
GPU usage remain the bottleneck of applying pre-trained deep models in production. As a …

被引用次数：132 相关文章所有 4 个版本

[PDF] arxiv.org

Greedy-layer pruning: Speeding up transformer models for natural language processing

D Peer, S Stabinger, S Engl… - Pattern recognition …, 2022 - Elsevier

Fine-tuning transformer models after unsupervised pre-training reaches a very high
performance on many different natural language processing tasks. Unfortunately …

被引用次数：38 相关文章所有 4 个版本

[PDF] arxiv.org

Pretrained encyclopedia: Weakly supervised knowledge-pretrained language model

W Xiong, J Du, WY Wang, V Stoyanov - arXiv preprint arXiv:1912.09637, 2019 - arxiv.org

Recent breakthroughs of pretrained language models have shown the effectiveness of self-
supervised learning for a wide range of natural language processing (NLP) tasks. In addition …

被引用次数：121 相关文章所有 4 个版本

[PDF] researchgate.net

Knowledge distillation across ensembles of multilingual models for low-resource languages

J Cui, B Kingsbury, B Ramabhadran… - … , Speech and Signal …, 2017 - ieeexplore.ieee.org

This paper investigates the effectiveness of knowledge distillation in the context of
multilingual models. We show that with knowledge distillation, Long Short-Term Memory …

被引用次数：78 相关文章所有 5 个版本

[PDF] arxiv.org

Rethinking kullback-leibler divergence in knowledge distillation for large language models

T Wu, C Tao, J Wang, R Yang, Z Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org

Kullback-Leiber divergence has been widely used in Knowledge Distillation (KD) to
compress Large Language Models (LLMs). Contrary to prior assertions that reverse …

被引用次数：14 相关文章所有 2 个版本

高级搜索

QQ 群

Self-knowledge distillation in natural language processing

LRC-BERT: latent-representation contrastive knowledge distillation for natural language understanding

Textbrewer: An open-source knowledge distillation toolkit for natural language processing

Improving multi-task deep neural networks via knowledge distillation for natural language understanding

A survey of knowledge enhanced pre-trained models

Reinforced multi-teacher selection for knowledge distillation

Greedy-layer pruning: Speeding up transformer models for natural language processing

Pretrained encyclopedia: Weakly supervised knowledge-pretrained language model

Knowledge distillation across ensembles of multilingual models for low-resource languages

Rethinking kullback-leibler divergence in knowledge distillation for large language models

引用