RW-KD: Sample-wise loss terms re-weighting for knowledge distillation

P Lu, A Ghaddar, A Rashid… - Findings of the …, 2021 - aclanthology.org
Abstract Knowledge Distillation (KD) is extensively used in Natural Language Processing to
compress the pre-training and task-specific fine-tuning phases of large neural language …

[PDF][PDF] RW-KD: Sample-wise Loss Terms Re-Weighting for Knowledge Distillation

P Lu, A Ghaddar, A Rashid, M Rezagholizadeh… - scholar.archive.org
Abstract Knowledge Distillation (KD) is extensively used in Natural Language Processing to
compress the pre-training and task-specific finetuning phases of large neural language …

[PDF][PDF] RW-KD: Sample-wise Loss Terms Re-Weighting for Knowledge Distillation

P Lu, A Ghaddar, A Rashid, M Rezagholizadeh… - pdfs.semanticscholar.org
Abstract Knowledge Distillation (KD) is extensively used in Natural Language Processing to
compress the pre-training and task-specific finetuning phases of large neural language …