We present a method to formulate algorithm discovery as program search, and apply it to discover optimization algorithms for deep neural network training. We leverage efficient …
Large language models (LLMs) with hundreds of billions of parameters have sparked a new wave of exciting AI applications. However, they are computationally expensive at inference …
Large language models (LMs) are able to in-context learn--perform a new task via inference alone by conditioning on a few input-label pairs (demonstrations) and making predictions for …
We introduce MetaICL (Meta-training for In-Context Learning), a new meta-training framework for few-shot learning where a pretrained language model is tuned to do in …
We evaluate the reasoning abilities of large language models in multilingual settings. We introduce the Multilingual Grade School Math (MGSM) benchmark, by manually translating …
Despite the success, the process of fine-tuning large-scale PLMs brings prohibitive adaptation costs. In fact, fine-tuning all the parameters of a colossal model and retaining …
T Schick, H Schütze - arXiv preprint arXiv:2009.07118, 2020 - arxiv.org
When scaled to hundreds of billions of parameters, pretrained language models such as GPT-3 (Brown et al., 2020) achieve remarkable few-shot performance. However, enormous …
In this paper, we introduce NLP resources for 11 major Indian languages from two major language families. These resources include:(a) large-scale sentence-level monolingual …
Large language models are commonly trained on a mixture of filtered web data and curated``high-quality''corpora, such as social media conversations, books, or technical …