所有版本 - 学术资源搜索

文章

学术资源搜索

获得 3 条结果（用时0.02秒）

Baby llama: knowledge distillation from an ensemble of teachers trained on a small dataset with no performance penalty

I Timiryasov, JL Tastet - arXiv preprint arXiv:2308.02019, 2023 - arxiv.org

We present our proposed solution to the BabyLM challenge [arXiv: 2301.11796], whose goal
was to improve the sample efficiency of language models. We trained an ensemble …

被引用次数：40 相关文章

Baby Llama: knowledge distillation from an ensemble of teachers trained on a small dataset with no performance penalty

I Timiryasov, JL Tastet - arXiv e-prints, 2023 - ui.adsabs.harvard.edu

We present our proposed solution to the BabyLM challenge [arXiv: 2301.11796], whose goal
was to improve the sample efficiency of language models. We trained an ensemble …

[引用][C] Baby Llama: knowledge distillation from an ensemble of teachers trained on a small dataset with no performance penalty

I Timiryasov, JL Tastet - Proceedings of the BabyLM Challenge at …, 2023 - aclanthology.org

Baby Llama: knowledge distillation from an ensemble of teachers trained on a small dataset with no
performance penalty - ACL Anthology ACL Logo ACL Anthology FAQ(current) Corrections(current) …

高级搜索

QQ 群

Baby llama: knowledge distillation from an ensemble of teachers trained on a small dataset with no performance penalty

Baby Llama: knowledge distillation from an ensemble of teachers trained on a small dataset with no performance penalty

[引用][C] Baby Llama: knowledge distillation from an ensemble of teachers trained on a small dataset with no performance penalty

引用