Universal-KD: Attention-based output-grounded intermediate layer knowledge distillation

Y Wu, M Rezagholizadeh, A Ghaddar… - Proceedings of the …, 2021 - aclanthology.org
Intermediate layer matching is shown as an effective approach for improving knowledge
distillation (KD). However, this technique applies matching in the hidden spaces of two …

[PDF][PDF] Universal-KD: Attention-based Output-Grounded Intermediate Layer Knowledge Distillation

Y Wu, M Rezagholizadeh, A Ghaddar, MA Haidar… - scholar.archive.org
Intermediate layer matching is shown as an effective approach for improving knowledge
distillation (KD). However, this technique applies matching in the hidden spaces of two …