What language model to train if you have one million gpu hours?

TL Scao, T Wang, D Hesslow, L Saulnier… - arXiv preprint arXiv …, 2022 - arxiv.org
The crystallization of modeling methods around the Transformer architecture has been a
boon for practitioners. Simple, well-motivated architectural variations can transfer across …

What Language Model to Train if You Have One Million GPU Hours?

T Le Scao, T Wang, D Hesslow, S Bekman… - Findings of the …, 2022 - aclanthology.org
The crystallization of modeling methods around the Transformer architecture has been a
boon for practitioners. Simple, well-motivated architectural variations can transfer across …

What Language Model to Train if You Have One Million GPU Hours?

T Le Scao, T Wang, D Hesslow, L Saulnier… - … {\&} Perspectives in … - openreview.net
The crystallization of modeling methods around the Transformer architecture has been a
boon for practitioners. Simple, well-motivated architectural variations that transfer across …

What Language Model to Train if You Have One Million GPU Hours?

T Le Scao, T Wang, D Hesslow, L Saulnier… - arXiv e …, 2022 - ui.adsabs.harvard.edu
The crystallization of modeling methods around the Transformer architecture has been a
boon for practitioners. Simple, well-motivated architectural variations can transfer across …