Larger language models do in-context learning differently

J Wei, J Wei, Y Tay, D Tran, A Webson, Y Lu… - arXiv preprint arXiv …, 2023 - arxiv.org
We study how in-context learning (ICL) in language models is affected by semantic priors
versus input-label mappings. We investigate two setups-ICL with flipped labels and ICL with …

Transformers learn to implement preconditioned gradient descent for in-context learning

K Ahn, X Cheng, H Daneshmand… - Advances in Neural …, 2023 - proceedings.neurips.cc
Several recent works demonstrate that transformers can implement algorithms like gradient
descent. By a careful construction of weights, these works show that multiple layers of …

Many-shot in-context learning

R Agarwal, A Singh, LM Zhang, B Bohnet… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) excel at few-shot in-context learning (ICL)--learning from a
few examples provided in context at inference, without any weight updates. Newly expanded …

In-context unlearning: Language models as few shot unlearners

M Pawelczyk, S Neel, H Lakkaraju - arXiv preprint arXiv:2310.07579, 2023 - arxiv.org
Machine unlearning, the study of efficiently removing the impact of specific training points on
the trained model, has garnered increased attention of late, driven by the need to comply …

Combating noisy labels with sample selection by mining high-discrepancy examples

X Xia, B Han, Y Zhan, J Yu, M Gong… - Proceedings of the …, 2023 - openaccess.thecvf.com
The sample selection approach is popular in learning with noisy labels. The state-of-the-art
methods train two deep networks simultaneously for sample selection, which aims to employ …

A survey on statistical theory of deep learning: Approximation, training dynamics, and generative models

N Suh, G Cheng - Annual Review of Statistics and Its Application, 2024 - annualreviews.org
In this article, we review the literature on statistical theories of neural networks from three
perspectives: approximation, training dynamics, and generative models. In the first part …

Are Emergent Abilities in Large Language Models just In-Context Learning?

S Lu, I Bigoulaeva, R Sachdeva, HT Madabushi… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models have exhibited emergent abilities, demonstrating exceptional
performance across diverse tasks for which they were not explicitly trained, including those …

Humanmac: Masked motion completion for human motion prediction

LH Chen, J Zhang, Y Li, Y Pang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Human motion prediction is a classical problem in computer vision and computer graphics,
which has a wide range of practical applications. Previous effects achieve great empirical …

Transformers as support vector machines

DA Tarzanagh, Y Li, C Thrampoulidis… - arXiv preprint arXiv …, 2023 - arxiv.org
Since its inception in" Attention Is All You Need", transformer architecture has led to
revolutionary advancements in NLP. The attention layer within the transformer admits a …

Uncovering mesa-optimization algorithms in transformers

J Von Oswald, M Schlegel, A Meulemans… - arXiv preprint arXiv …, 2023 - arxiv.org
Some autoregressive models exhibit in-context learning capabilities: being able to learn as
an input sequence is processed, without undergoing any parameter changes, and without …