K Huang, X Guo, M Wang - … of the 37th International Conference on …, 2023 - dl.acm.org
Knowledge Distillation (KD) has emerged as a promising approach for compressing large
Pre-trained Language Models (PLMs). The performance of KD relies on how to effectively …