Knowledge distillation of transformer-based language models revisited

C Lu, J Zhang, Y Chu, Z Chen, J Zhou, F Wu… - arXiv preprint arXiv …, 2022 - arxiv.org
In the past few years, transformer-based pre-trained language models have achieved
astounding success in both industry and academia. However, the large model size and high …

Knowledge Distillation of Transformer-based Language Models Revisited

C Lu, J Zhang, Y Chu, Z Chen, J Zhou, F Wu… - arXiv e …, 2022 - ui.adsabs.harvard.edu
In the past few years, transformer-based pre-trained language models have achieved
astounding success in both industry and academia. However, the large model size and high …