Multi-level distillation of semantic knowledge for pre-training multilingual language model

M Li, F Ding, D Zhang, L Cheng, H Hu, F Luo - arXiv preprint arXiv …, 2022 - arxiv.org
Pre-trained multilingual language models play an important role in cross-lingual natural
language understanding tasks. However, existing methods did not focus on learning the …

Preview-based Category Contrastive Learning for Knowledge Distillation

M Ding, J Wu, X Dong, X Li, P Qin, T Gan… - arXiv preprint arXiv …, 2024 - arxiv.org
Knowledge distillation is a mainstream algorithm in model compression by transferring
knowledge from the larger model (teacher) to the smaller model (student) to improve the …

[PDF][PDF] Multi-level Distillation of Semantic Knowledge for Pre-training Multilingual Language Model

L Mingqi, F Ding, D Zhang, L Cheng, H Hu… - Emperical Methods in …, 2022 - par.nsf.gov
Pre-trained multilingual language models play an important role in cross-lingual natural
language understanding tasks. However, existing methods did not focus on learning the …