- 学术资源搜索

Larger language models do in-context learning differently

J Wei, J Wei, Y Tay, D Tran, A Webson, Y Lu… - arXiv preprint arXiv …, 2023 - arxiv.org

We study how in-context learning (ICL) in language models is affected by semantic priors
versus input-label mappings. We investigate two setups-ICL with flipped labels and ICL with …

被引用次数：295 相关文章所有 7 个版本

[PDF] mlr.press

How do transformers learn topic structure: Towards a mechanistic understanding

Y Li, Y Li, A Risteski - International Conference on Machine …, 2023 - proceedings.mlr.press

While the successes of transformers across many domains are indisputable, accurate
understanding of the learning mechanics is still largely lacking. Their capabilities have been …

被引用次数：78 相关文章所有 6 个版本

[PDF] mlr.press

On the power of foundation models

Y Yuan - International Conference on Machine Learning, 2023 - proceedings.mlr.press

With infinitely many high-quality data points, infinite computational power, an infinitely large
foundation model with a perfect training algorithm and guaranteed zero generalization error …

被引用次数：45 相关文章所有 7 个版本

[PDF] neurips.cc

The mechanism of prediction head in non-contrastive self-supervised learning

Z Wen, Y Li - Advances in Neural Information Processing …, 2022 - proceedings.neurips.cc

The surprising discovery of the BYOL method shows the negative samples can be replaced
by adding the prediction head to the network. It is mysterious why even when there exist …

被引用次数：35 相关文章所有 7 个版本

[PDF] arxiv.org

Bypassing the exponential dependency: Looped transformers efficiently learn in-context by multi-step gradient descent

B Chen, X Li, Y Liang, Z Shi, Z Song - arXiv preprint arXiv:2410.11268, 2024 - arxiv.org

In-context learning has been recognized as a key factor in the success of Large Language
Models (LLMs). It refers to the model's ability to learn patterns on the fly from provided in …

被引用次数：12 相关文章所有 2 个版本

[PDF] arxiv.org

Self-Supervised Pretraining for Robust Personalized Voice Activity Detection in Adverse Conditions

HS Bovbjerg, J Jensen, J Østergaard… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

In this paper, we propose the use of self-supervised pretraining on a large unlabelled data
set to improve the performance of a personalized voice activity detection (VAD) model in …

被引用次数：3 相关文章所有 3 个版本

Towards Understanding Embeddings of Neural Network A Theoretical Perspective

J Wei - 2024 - search.proquest.com

Theoretically understanding the success of modern neural networks remains challenging. In
the direction of theoretically understanding fully connected Multilayer Perceptrons (MLPs) …

高级搜索

QQ 群