Transformers learn in-context by gradient descent

J Kaddour, J Harris, M Mozes, H Bradley… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …

被引用次数：470 相关文章所有 3 个版本

[HTML] aip.org

[HTML][HTML] Brain-inspired learning in artificial neural networks: a review

S Schmidgall, R Ziaei, J Achterberg, L Kirsch… - APL Machine …, 2024 - pubs.aip.org

Artificial neural networks (ANNs) have emerged as an essential tool in machine learning,
achieving remarkable success across diverse domains, including image and speech …

被引用次数：60 相关文章所有 4 个版本

[PDF] arxiv.org

A survey of large language models

WX Zhao, K Zhou, J Li, T Tang, X Wang, Y Hou… - arXiv preprint arXiv …, 2023 - arxiv.org

Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …

被引用次数：3440 相关文章所有 4 个版本

[PDF] arxiv.org

A survey on in-context learning

Q Dong, L Li, D Dai, C Zheng, J Ma, R Li, H Xia… - arXiv preprint arXiv …, 2022 - arxiv.org

With the increasing capabilities of large language models (LLMs), in-context learning (ICL)
has emerged as a new paradigm for natural language processing (NLP), where LLMs make …

被引用次数：1319 相关文章所有 2 个版本

[PDF] neurips.cc

Language models can solve computer tasks

G Kim, P Baldi, S McAleer - Advances in Neural Information …, 2023 - proceedings.neurips.cc

Agents capable of carrying out general tasks on a computer can improve efficiency and
productivity by automating repetitive tasks and assisting in complex problem-solving. Ideally …

被引用次数：303 相关文章所有 6 个版本

[PDF] arxiv.org

Larger language models do in-context learning differently

J Wei, J Wei, Y Tay, D Tran, A Webson, Y Lu… - arXiv preprint arXiv …, 2023 - arxiv.org

We study how in-context learning (ICL) in language models is affected by semantic priors
versus input-label mappings. We investigate two setups-ICL with flipped labels and ICL with …

被引用次数：295 相关文章所有 7 个版本

[PDF] neurips.cc

Transformers learn to implement preconditioned gradient descent for in-context learning

K Ahn, X Cheng, H Daneshmand… - Advances in Neural …, 2023 - proceedings.neurips.cc

Several recent works demonstrate that transformers can implement algorithms like gradient
descent. By a careful construction of weights, these works show that multiple layers of …

被引用次数：156 相关文章所有 5 个版本

[PDF] neurips.cc

Transformers as statisticians: Provable in-context learning with in-context algorithm selection

Y Bai, F Chen, H Wang, C Xiong… - Advances in neural …, 2024 - proceedings.neurips.cc

Neural sequence models based on the transformer architecture have demonstrated
remarkable\emph {in-context learning}(ICL) abilities, where they can perform new tasks …

被引用次数：179 相关文章所有 7 个版本

[PDF] arxiv.org

Jailbreak and guard aligned language models with only few in-context demonstrations

Z Wei, Y Wang, A Li, Y Mo, Y Wang - arXiv preprint arXiv:2310.06387, 2023 - arxiv.org

Large Language Models (LLMs) have shown remarkable success in various tasks, yet their
safety and the risk of generating harmful content remain pressing concerns. In this paper, we …

被引用次数：171 相关文章所有 2 个版本

[PDF] arxiv.org

Why can gpt learn in-context? language models implicitly perform gradient descent as meta-optimizers

D Dai, Y Sun, L Dong, Y Hao, S Ma, Z Sui… - arXiv preprint arXiv …, 2022 - arxiv.org

Large pretrained language models have shown surprising in-context learning (ICL) ability.
With a few demonstration input-label pairs, they can predict the label for an unseen input …

被引用次数：349 相关文章所有 4 个版本

高级搜索

QQ 群