Larger language models do in-context learning differently

J Wei, J Wei, Y Tay, D Tran, A Webson, Y Lu… - arXiv preprint arXiv …, 2023 - arxiv.org
We study how in-context learning (ICL) in language models is affected by semantic priors
versus input-label mappings. We investigate two setups-ICL with flipped labels and ICL with …

How do transformers learn topic structure: Towards a mechanistic understanding

Y Li, Y Li, A Risteski - International Conference on Machine …, 2023 - proceedings.mlr.press
While the successes of transformers across many domains are indisputable, accurate
understanding of the learning mechanics is still largely lacking. Their capabilities have been …

On the power of foundation models

Y Yuan - International Conference on Machine Learning, 2023 - proceedings.mlr.press
With infinitely many high-quality data points, infinite computational power, an infinitely large
foundation model with a perfect training algorithm and guaranteed zero generalization error …

The mechanism of prediction head in non-contrastive self-supervised learning

Z Wen, Y Li - Advances in Neural Information Processing …, 2022 - proceedings.neurips.cc
The surprising discovery of the BYOL method shows the negative samples can be replaced
by adding the prediction head to the network. It is mysterious why even when there exist …

Bypassing the exponential dependency: Looped transformers efficiently learn in-context by multi-step gradient descent

B Chen, X Li, Y Liang, Z Shi, Z Song - arXiv preprint arXiv:2410.11268, 2024 - arxiv.org
In-context learning has been recognized as a key factor in the success of Large Language
Models (LLMs). It refers to the model's ability to learn patterns on the fly from provided in …

Self-Supervised Pretraining for Robust Personalized Voice Activity Detection in Adverse Conditions

HS Bovbjerg, J Jensen, J Østergaard… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
In this paper, we propose the use of self-supervised pretraining on a large unlabelled data
set to improve the performance of a personalized voice activity detection (VAD) model in …

Towards Understanding Embeddings of Neural Network A Theoretical Perspective

J Wei - 2024 - search.proquest.com
Theoretically understanding the success of modern neural networks remains challenging. In
the direction of theoretically understanding fully connected Multilayer Perceptrons (MLPs) …