- 学术资源搜索

Larger language models do in-context learning differently

J Wei, J Wei, Y Tay, D Tran, A Webson, Y Lu… - arXiv preprint arXiv …, 2023 - arxiv.org

We study how in-context learning (ICL) in language models is affected by semantic priors
versus input-label mappings. We investigate two setups-ICL with flipped labels and ICL with …

被引用次数：295 相关文章所有 7 个版本

[PDF] arxiv.org

Trained transformers learn linear models in-context

R Zhang, S Frei, PL Bartlett - arXiv preprint arXiv:2306.09927, 2023 - arxiv.org

Attention-based neural networks such as transformers have demonstrated a remarkable
ability to exhibit in-context learning (ICL): Given a short prompt sequence of tokens from an …

被引用次数：138 相关文章所有 4 个版本

[PDF] neurips.cc

Max-margin token selection in attention mechanism

D Ataee Tarzanagh, Y Li, X Zhang… - Advances in Neural …, 2023 - proceedings.neurips.cc

Attention mechanism is a central component of the transformer architecture which led to the
phenomenal success of large language models. However, the theoretical principles …

被引用次数：47 相关文章所有 6 个版本

[PDF] arxiv.org

Label words are anchors: An information flow perspective for understanding in-context learning

L Wang, L Li, D Dai, D Chen, H Zhou, F Meng… - arXiv preprint arXiv …, 2023 - arxiv.org

In-context learning (ICL) emerges as a promising capability of large language models
(LLMs) by providing them with demonstration examples to perform diverse tasks. However …

被引用次数：118 相关文章所有 5 个版本

[PDF] aclanthology.org

The mystery of in-context learning: A comprehensive survey on interpretation and analysis

Y Zhou, J Li, Y Xiang, H Yan, L Gui… - Proceedings of the 2024 …, 2024 - aclanthology.org

Understanding in-context learning (ICL) capability that enables large language models
(LLMs) to excel in proficiency through demonstration examples is of utmost importance. This …

被引用次数：4 相关文章所有 3 个版本

[PDF] aclanthology.org

Are Emergent Abilities in Large Language Models just In-Context Learning?

S Lu, I Bigoulaeva, R Sachdeva, HT Madabushi… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models have exhibited emergent abilities, demonstrating exceptional
performance across diverse tasks for which they were not explicitly trained, including those …

被引用次数：84 相关文章所有 3 个版本

[PDF] arxiv.org

Transformers as support vector machines

DA Tarzanagh, Y Li, C Thrampoulidis… - arXiv preprint arXiv …, 2023 - arxiv.org

Since its inception in" Attention Is All You Need", transformer architecture has led to
revolutionary advancements in NLP. The attention layer within the transformer admits a …

被引用次数：77 相关文章所有 2 个版本

[PDF] arxiv.org

What and how does in-context learning learn? bayesian model averaging, parameterization, and generalization

Y Zhang, F Zhang, Z Yang, Z Wang - arXiv preprint arXiv:2305.19420, 2023 - arxiv.org

In this paper, we conduct a comprehensive study of In-Context Learning (ICL) by addressing
several open questions:(a) What type of ICL estimator is learned by large language …

被引用次数：60 相关文章所有 3 个版本

[PDF] arxiv.org

Function vectors in large language models

E Todd, ML Li, AS Sharma, A Mueller… - arXiv preprint arXiv …, 2023 - arxiv.org

We report the presence of a simple neural mechanism that represents an input-output
function as a vector within autoregressive transformer language models (LMs). Using causal …

被引用次数：97 相关文章所有 4 个版本

[PDF] arxiv.org

Uncovering mesa-optimization algorithms in transformers

J Von Oswald, M Schlegel, A Meulemans… - arXiv preprint arXiv …, 2023 - arxiv.org

Some autoregressive models exhibit in-context learning capabilities: being able to learn as
an input sequence is processed, without undergoing any parameter changes, and without …

被引用次数：40 相关文章所有 2 个版本

高级搜索

QQ 群

Larger language models do in-context learning differently

Trained transformers learn linear models in-context

Max-margin token selection in attention mechanism

Label words are anchors: An information flow perspective for understanding in-context learning

The mystery of in-context learning: A comprehensive survey on interpretation and analysis

Are Emergent Abilities in Large Language Models just In-Context Learning?

Transformers as support vector machines

What and how does in-context learning learn? bayesian model averaging, parameterization, and generalization

Function vectors in large language models

Uncovering mesa-optimization algorithms in transformers

引用