- 学术资源搜索

Larger language models do in-context learning differently

J Wei, J Wei, Y Tay, D Tran, A Webson, Y Lu… - arXiv preprint arXiv …, 2023 - arxiv.org

We study how in-context learning (ICL) in language models is affected by semantic priors
versus input-label mappings. We investigate two setups-ICL with flipped labels and ICL with …

被引用次数：296 相关文章所有 7 个版本

[PDF] neurips.cc

Transformers learn to implement preconditioned gradient descent for in-context learning

K Ahn, X Cheng, H Daneshmand… - Advances in Neural …, 2023 - proceedings.neurips.cc

Several recent works demonstrate that transformers can implement algorithms like gradient
descent. By a careful construction of weights, these works show that multiple layers of …

被引用次数：156 相关文章所有 5 个版本

[PDF] arxiv.org

Many-shot in-context learning

R Agarwal, A Singh, LM Zhang, B Bohnet… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) excel at few-shot in-context learning (ICL)--learning from a
few examples provided in context at inference, without any weight updates. Newly expanded …

被引用次数：52 相关文章所有 3 个版本

[PDF] arxiv.org

In-context unlearning: Language models as few shot unlearners

M Pawelczyk, S Neel, H Lakkaraju - arXiv preprint arXiv:2310.07579, 2023 - arxiv.org

Machine unlearning, the study of efficiently removing the impact of specific training points on
the trained model, has garnered increased attention of late, driven by the need to comply …

被引用次数：85 相关文章所有 4 个版本

[PDF] thecvf.com

Combating noisy labels with sample selection by mining high-discrepancy examples

X Xia, B Han, Y Zhan, J Yu, M Gong… - Proceedings of the …, 2023 - openaccess.thecvf.com

The sample selection approach is popular in learning with noisy labels. The state-of-the-art
methods train two deep networks simultaneously for sample selection, which aims to employ …

被引用次数：36 相关文章所有 5 个版本

[PDF] arxiv.org

A survey on statistical theory of deep learning: Approximation, training dynamics, and generative models

N Suh, G Cheng - Annual Review of Statistics and Its Application, 2024 - annualreviews.org

In this article, we review the literature on statistical theories of neural networks from three
perspectives: approximation, training dynamics, and generative models. In the first part …

被引用次数：6 相关文章所有 2 个版本

[PDF] aclanthology.org

Are Emergent Abilities in Large Language Models just In-Context Learning?

S Lu, I Bigoulaeva, R Sachdeva, HT Madabushi… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models have exhibited emergent abilities, demonstrating exceptional
performance across diverse tasks for which they were not explicitly trained, including those …

被引用次数：85 相关文章所有 3 个版本

[PDF] thecvf.com

Humanmac: Masked motion completion for human motion prediction

LH Chen, J Zhang, Y Li, Y Pang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Human motion prediction is a classical problem in computer vision and computer graphics,
which has a wide range of practical applications. Previous effects achieve great empirical …

被引用次数：64 相关文章所有 6 个版本

[PDF] arxiv.org

Transformers as support vector machines

DA Tarzanagh, Y Li, C Thrampoulidis… - arXiv preprint arXiv …, 2023 - arxiv.org

Since its inception in" Attention Is All You Need", transformer architecture has led to
revolutionary advancements in NLP. The attention layer within the transformer admits a …

被引用次数：76 相关文章所有 2 个版本

[PDF] arxiv.org

Uncovering mesa-optimization algorithms in transformers

J Von Oswald, M Schlegel, A Meulemans… - arXiv preprint arXiv …, 2023 - arxiv.org

Some autoregressive models exhibit in-context learning capabilities: being able to learn as
an input sequence is processed, without undergoing any parameter changes, and without …

被引用次数：40 相关文章所有 2 个版本

高级搜索

QQ 群

Larger language models do in-context learning differently

Transformers learn to implement preconditioned gradient descent for in-context learning

Many-shot in-context learning

In-context unlearning: Language models as few shot unlearners

Combating noisy labels with sample selection by mining high-discrepancy examples

A survey on statistical theory of deep learning: Approximation, training dynamics, and generative models

Are Emergent Abilities in Large Language Models just In-Context Learning?

Humanmac: Masked motion completion for human motion prediction

Transformers as support vector machines

Uncovering mesa-optimization algorithms in transformers

引用