Qa dataset explosion: A taxonomy of nlp resources for question answering and reading comprehension

A Rogers, M Gardner, I Augenstein - ACM Computing Surveys, 2023 - dl.acm.org
Alongside huge volumes of research on deep learning models in NLP in the recent years,
there has been much work on benchmark datasets needed to track modeling progress …

Symbolic discovery of optimization algorithms

X Chen, C Liang, D Huang, E Real… - Advances in neural …, 2024 - proceedings.neurips.cc
We present a method to formulate algorithm discovery as program search, and apply it to
discover optimization algorithms for deep neural network training. We leverage efficient …

Deja vu: Contextual sparsity for efficient llms at inference time

Z Liu, J Wang, T Dao, T Zhou, B Yuan… - International …, 2023 - proceedings.mlr.press
Large language models (LLMs) with hundreds of billions of parameters have sparked a new
wave of exciting AI applications. However, they are computationally expensive at inference …

Rethinking the role of demonstrations: What makes in-context learning work?

S Min, X Lyu, A Holtzman, M Artetxe, M Lewis… - arXiv preprint arXiv …, 2022 - arxiv.org
Large language models (LMs) are able to in-context learn--perform a new task via inference
alone by conditioning on a few input-label pairs (demonstrations) and making predictions for …

Metaicl: Learning to learn in context

S Min, M Lewis, L Zettlemoyer, H Hajishirzi - arXiv preprint arXiv …, 2021 - arxiv.org
We introduce MetaICL (Meta-training for In-Context Learning), a new meta-training
framework for few-shot learning where a pretrained language model is tuned to do in …

Language models are multilingual chain-of-thought reasoners

F Shi, M Suzgun, M Freitag, X Wang, S Srivats… - arXiv preprint arXiv …, 2022 - arxiv.org
We evaluate the reasoning abilities of large language models in multilingual settings. We
introduce the Multilingual Grade School Math (MGSM) benchmark, by manually translating …

Delta tuning: A comprehensive study of parameter efficient methods for pre-trained language models

N Ding, Y Qin, G Yang, F Wei, Z Yang, Y Su… - arXiv preprint arXiv …, 2022 - arxiv.org
Despite the success, the process of fine-tuning large-scale PLMs brings prohibitive
adaptation costs. In fact, fine-tuning all the parameters of a colossal model and retaining …

It's not just size that matters: Small language models are also few-shot learners

T Schick, H Schütze - arXiv preprint arXiv:2009.07118, 2020 - arxiv.org
When scaled to hundreds of billions of parameters, pretrained language models such as
GPT-3 (Brown et al., 2020) achieve remarkable few-shot performance. However, enormous …

IndicNLPSuite: Monolingual corpora, evaluation benchmarks and pre-trained multilingual language models for Indian languages

D Kakwani, A Kunchukuttan, S Golla… - Findings of the …, 2020 - aclanthology.org
In this paper, we introduce NLP resources for 11 major Indian languages from two major
language families. These resources include:(a) large-scale sentence-level monolingual …

The RefinedWeb dataset for Falcon LLM: Outperforming curated corpora with web data only

G Penedo, Q Malartic, D Hesslow… - Advances in …, 2023 - proceedings.neurips.cc
Large language models are commonly trained on a mixture of filtered web data and
curated``high-quality''corpora, such as social media conversations, books, or technical …