Larger language models do in-context learning differently

J Wei, J Wei, Y Tay, D Tran, A Webson, Y Lu… - arXiv preprint arXiv …, 2023 - arxiv.org
We study how in-context learning (ICL) in language models is affected by semantic priors
versus input-label mappings. We investigate two setups-ICL with flipped labels and ICL with …

Trained transformers learn linear models in-context

R Zhang, S Frei, PL Bartlett - arXiv preprint arXiv:2306.09927, 2023 - arxiv.org
Attention-based neural networks such as transformers have demonstrated a remarkable
ability to exhibit in-context learning (ICL): Given a short prompt sequence of tokens from an …

Max-margin token selection in attention mechanism

D Ataee Tarzanagh, Y Li, X Zhang… - Advances in Neural …, 2023 - proceedings.neurips.cc
Attention mechanism is a central component of the transformer architecture which led to the
phenomenal success of large language models. However, the theoretical principles …

Label words are anchors: An information flow perspective for understanding in-context learning

L Wang, L Li, D Dai, D Chen, H Zhou, F Meng… - arXiv preprint arXiv …, 2023 - arxiv.org
In-context learning (ICL) emerges as a promising capability of large language models
(LLMs) by providing them with demonstration examples to perform diverse tasks. However …

The mystery of in-context learning: A comprehensive survey on interpretation and analysis

Y Zhou, J Li, Y Xiang, H Yan, L Gui… - Proceedings of the 2024 …, 2024 - aclanthology.org
Understanding in-context learning (ICL) capability that enables large language models
(LLMs) to excel in proficiency through demonstration examples is of utmost importance. This …

Are Emergent Abilities in Large Language Models just In-Context Learning?

S Lu, I Bigoulaeva, R Sachdeva, HT Madabushi… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models have exhibited emergent abilities, demonstrating exceptional
performance across diverse tasks for which they were not explicitly trained, including those …

Transformers as support vector machines

DA Tarzanagh, Y Li, C Thrampoulidis… - arXiv preprint arXiv …, 2023 - arxiv.org
Since its inception in" Attention Is All You Need", transformer architecture has led to
revolutionary advancements in NLP. The attention layer within the transformer admits a …

What and how does in-context learning learn? bayesian model averaging, parameterization, and generalization

Y Zhang, F Zhang, Z Yang, Z Wang - arXiv preprint arXiv:2305.19420, 2023 - arxiv.org
In this paper, we conduct a comprehensive study of In-Context Learning (ICL) by addressing
several open questions:(a) What type of ICL estimator is learned by large language …

Function vectors in large language models

E Todd, ML Li, AS Sharma, A Mueller… - arXiv preprint arXiv …, 2023 - arxiv.org
We report the presence of a simple neural mechanism that represents an input-output
function as a vector within autoregressive transformer language models (LMs). Using causal …

Uncovering mesa-optimization algorithms in transformers

J Von Oswald, M Schlegel, A Meulemans… - arXiv preprint arXiv …, 2023 - arxiv.org
Some autoregressive models exhibit in-context learning capabilities: being able to learn as
an input sequence is processed, without undergoing any parameter changes, and without …