查看文章

nsf.gov 中的 [PDF]

Multi-level Distillation of Semantic Knowledge for Pre-training Multilingual Language Model

作者

Li Mingqi, Fei Ding, Dan Zhang, Long Cheng, Hongxin Hu, Feng Luo

发表日期

2022/1

期刊

Emperical Methods in Natural Language Processing

简介

Pre-trained multilingual language models play an important role in cross-lingual natural language understanding tasks. However, existing methods did not focus on learning the semantic structure of representation, and thus could not optimize their performance. In this paper, we propose Multi-level Multilingual Knowledge Distillation (MMKD), a novel method for improving multilingual language models. Specifically, we employ a teacher-student framework to adopt rich semantic representation knowledge in English BERT. We propose token-, word-, sentence-, and structure-level alignment objectives to encourage multiple levels of consistency between source-target pairs and correlation similarity between teacher and student models. We conduct experiments on crosslingual evaluation benchmarks including XNLI, PAWS-X, and XQuAD. Experimental results show that MMKD outperforms other baseline models of similar size on XNLI and XQuAD and obtains comparable performance on PAWSX. Especially, MMKD obtains significant performance gains on low-resource languages.

学术搜索中的文章

Multi-level Distillation of Semantic Knowledge for Pre-training Multilingual Language Model

L Mingqi, F Ding, D Zhang, L Cheng, H Hu, F Luo - Emperical Methods in Natural Language Processing, 2022