C Wu,
F Wu, Y Huang - arXiv preprint arXiv:2106.01023, 2021 - arxiv.org
Pre-trained language models (PLMs) achieve great success in NLP. However, their huge
model sizes hinder their applications in many practical systems. Knowledge distillation is a …