A survey on in-context learning

Q Dong, L Li, D Dai, C Zheng, Z Wu, B Chang… - arXiv preprint arXiv …, 2022 - arxiv.org
With the increasing ability of large language models (LLMs), in-context learning (ICL) has
become a new paradigm for natural language processing (NLP), where LLMs make …

Rethinking the role of demonstrations: What makes in-context learning work?

S Min, X Lyu, A Holtzman, M Artetxe, M Lewis… - arXiv preprint arXiv …, 2022 - arxiv.org
Large language models (LMs) are able to in-context learn--perform a new task via inference
alone by conditioning on a few input-label pairs (demonstrations) and making predictions for …

Large language models can be easily distracted by irrelevant context

F Shi, X Chen, K Misra, N Scales… - International …, 2023 - proceedings.mlr.press
Large language models have achieved impressive performance on various natural
language processing tasks. However, so far they have been evaluated primarily on …

Discovering latent knowledge in language models without supervision

C Burns, H Ye, D Klein, J Steinhardt - arXiv preprint arXiv:2212.03827, 2022 - arxiv.org
Existing techniques for training language models can be misaligned with the truth: if we train
models with imitation learning, they may reproduce errors that humans make; if we train …

Text and patterns: For effective chain of thought, it takes two to tango

A Madaan, A Yazdanbakhsh - arXiv preprint arXiv:2209.07686, 2022 - arxiv.org
The past decade has witnessed dramatic gains in natural language processing and an
unprecedented scaling of large language models. These developments have been …

Multilingual machine translation with large language models: Empirical results and analysis

W Zhu, H Liu, Q Dong, J Xu, S Huang, L Kong… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) have demonstrated remarkable potential in handling
multilingual machine translation (MMT). In this paper, we systematically investigate the …

Small models are valuable plug-ins for large language models

C Xu, Y Xu, S Wang, Y Liu, C Zhu… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) such as GPT-3 and GPT-4 are powerful but their weights are
often publicly unavailable and their immense sizes make the models difficult to be tuned with …

Function vectors in large language models

E Todd, ML Li, AS Sharma, A Mueller… - arXiv preprint arXiv …, 2023 - arxiv.org
We report the presence of a simple neural mechanism that represents an input-output
function as a vector within autoregressive transformer language models (LMs). Using causal …

Label words are anchors: An information flow perspective for understanding in-context learning

L Wang, L Li, D Dai, D Chen, H Zhou, F Meng… - arXiv preprint arXiv …, 2023 - arxiv.org
In-context learning (ICL) emerges as a promising capability of large language models
(LLMs) by providing them with demonstration examples to perform diverse tasks. However …

Large language models are in-context semantic reasoners rather than symbolic reasoners

X Tang, Z Zheng, J Li, F Meng, SC Zhu, Y Liang… - arXiv preprint arXiv …, 2023 - arxiv.org
The emergent few-shot reasoning capabilities of Large Language Models (LLMs) have
excited the natural language and machine learning community over recent years. Despite of …