Baby llama: knowledge distillation from an ensemble of teachers trained on a small dataset with no performance penalty

I Timiryasov, JL Tastet - arXiv preprint arXiv:2308.02019, 2023 - arxiv.org
We present our proposed solution to the BabyLM challenge [arXiv: 2301.11796], whose goal
was to improve the sample efficiency of language models. We trained an ensemble …

Baby Llama: knowledge distillation from an ensemble of teachers trained on a small dataset with no performance penalty

I Timiryasov, JL Tastet - arXiv e-prints, 2023 - ui.adsabs.harvard.edu
We present our proposed solution to the BabyLM challenge [arXiv: 2301.11796], whose goal
was to improve the sample efficiency of language models. We trained an ensemble …

[引用][C] Baby Llama: knowledge distillation from an ensemble of teachers trained on a small dataset with no performance penalty

I Timiryasov, JL Tastet - Proceedings of the BabyLM Challenge at …, 2023 - aclanthology.org
Baby Llama: knowledge distillation from an ensemble of teachers trained on a small dataset with no
performance penalty - ACL Anthology ACL Logo ACL Anthology FAQ(current) Corrections(current) …