The development of over-parameterized pre-trained language models has made a significant contribution toward the success of natural language processing. While over …
Recent work has focused on compressing pre-trained language models (PLMs) like BERT where the major focus has been to improve the in-distribution performance for downstream …
The development of over-parameterized pre-trained language models has made a significant contribution toward the success of natural language processing. While over …
MB Noach, Y Goldberg - Proceedings of the 1st Conference of the …, 2020 - aclanthology.org
Large pre-trained language models reach state-of-the-art results on many different NLP tasks when fine-tuned individually; They also come with a significant memory and …
S Wang, C Wang, J Gao, Z Qi, H Zheng… - arXiv preprint arXiv …, 2024 - arxiv.org
This study proposes a knowledge distillation algorithm based on large language models and feature alignment, aiming to effectively transfer the knowledge of large pre-trained …
Pre-trained language models have been applied to various NLP tasks with considerable performance gains. However, the large model sizes, together with the long inference time …
GPT is an auto-regressive Transformer-based pre-trained language model which has attracted a lot of attention in the natural language processing (NLP) domain due to its state …
J Ko, S Park, M Jeong, S Hong, E Ahn… - arXiv preprint arXiv …, 2023 - arxiv.org
Knowledge distillation (KD) is a highly promising method for mitigating the computational problems of pre-trained language models (PLMs). Among various KD approaches …
Pre-trained language models have demonstrated superior performance in various natural language processing tasks. However, these models usually contain hundreds of millions of …