所有版本 - 学术资源搜索

文章

学术资源搜索

获得 3 条结果（用时0.02秒）

RW-KD: Sample-wise loss terms re-weighting for knowledge distillation

P Lu, A Ghaddar, A Rashid… - Findings of the …, 2021 - aclanthology.org

Abstract Knowledge Distillation (KD) is extensively used in Natural Language Processing to
compress the pre-training and task-specific fine-tuning phases of large neural language …

被引用次数：11 相关文章

[PDF] archive.org

[PDF][PDF] RW-KD: Sample-wise Loss Terms Re-Weighting for Knowledge Distillation

P Lu, A Ghaddar, A Rashid, M Rezagholizadeh… - scholar.archive.org

Abstract Knowledge Distillation (KD) is extensively used in Natural Language Processing to
compress the pre-training and task-specific finetuning phases of large neural language …

[PDF] semanticscholar.org

[PDF][PDF] RW-KD: Sample-wise Loss Terms Re-Weighting for Knowledge Distillation

P Lu, A Ghaddar, A Rashid, M Rezagholizadeh… - pdfs.semanticscholar.org

Abstract Knowledge Distillation (KD) is extensively used in Natural Language Processing to
compress the pre-training and task-specific finetuning phases of large neural language …

高级搜索

QQ 群

RW-KD: Sample-wise loss terms re-weighting for knowledge distillation

[PDF][PDF] RW-KD: Sample-wise Loss Terms Re-Weighting for Knowledge Distillation

[PDF][PDF] RW-KD: Sample-wise Loss Terms Re-Weighting for Knowledge Distillation

引用