Language model pre-training, such as BERT, has significantly improved the performances of many natural language processing tasks. However, pre-trained language models are …
J Li, X Liu, H Zhao, R Xu, M Yang, Y Jin - arXiv preprint arXiv:2010.06133, 2020 - arxiv.org
Pre-trained language models (eg, BERT) have achieved significant success in various natural language processing (NLP) tasks. However, high storage and computational costs …
C Wu, F Wu, Y Huang - arXiv preprint arXiv:2106.01023, 2021 - arxiv.org
Pre-trained language models (PLMs) achieve great success in NLP. However, their huge model sizes hinder their applications in many practical systems. Knowledge distillation is a …
BERT is a cutting-edge language representation model pre-trained by a large corpus, which achieves superior performances on various natural language understanding tasks …
Pre-trained language models have been applied to various NLP tasks with considerable performance gains. However, the large model sizes, together with the long inference time …
Knowledge distillation has been shown to be a powerful model compression approach to facilitate the deployment of pre-trained language models in practice. This paper focuses on …
Pre-trained language models have demonstrated superior performance in various natural language processing tasks. However, these models usually contain hundreds of millions of …
Deep and large pre-trained language models are the state-of-the-art for various natural language processing tasks. However, the huge size of these models could be a deterrent to …
Knowledge distillation~(KD) has been proved effective for compressing large-scale pre- trained language models. However, existing methods conduct KD statically, eg, the student …