C Liang, H Jiang, Z Li, X Tang, B Yin, T Zhao - arXiv e-prints, 2023 - ui.adsabs.harvard.edu
Abstract Knowledge distillation has been shown to be a powerful model compression
approach to facilitate the deployment of pre-trained language models in practice. This paper …