Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection

C Wang, Y Lu, Y Mu, Y Hu, T Xiao, J Zhu - arXiv preprint arXiv:2302.00444, 2023 - arxiv.org
Knowledge distillation addresses the problem of transferring knowledge from a teacher
model to a student model. In this process, we typically have multiple types of knowledge …

[PDF][PDF] Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection

C Wang, Y Lu, Y Mu, Y Hu, T Xiao, J Zhu - researchgate.net
Abstract Knowledge distillation addresses the problem of transferring knowledge from a
teacher model to a student model. In this process, we typically have multiple types of …

Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection

C Wang, Y Lu, Y Mu, Y Hu, T Xiao, J Zhu - arXiv e-prints, 2023 - ui.adsabs.harvard.edu
Abstract Knowledge distillation addresses the problem of transferring knowledge from a
teacher model to a student model. In this process, we typically have multiple types of …

Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection

C Wang, Y Lu, Y Mu, Y Hu, T Xiao, J Zhu - EMNLP (Findings), 2022 - openreview.net
Knowledge distillation addresses the problem of transferring knowledge from a teacher
model to a student model. In this process, we typically have multiple types of knowledge …

Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection

C Wang, Y Lu, Y Mu, Y Hu, T Xiao… - Findings of the …, 2022 - aclanthology.org
Abstract Knowledge distillation addresses the problem of transferring knowledge from a
teacher model to a student model. In this process, we typically have multiple types of …