D Ganguli, D Hernandez, L Lovitt,
A Askell… - Proceedings of the …, 2022 - dl.acm.org
Large-scale pre-training has recently emerged as a technique for creating capable, general-
purpose, generative models such as GPT-3, Megatron-Turing NLG, Gopher, and many …