Several recent works demonstrate that transformers can implement algorithms like gradient descent. By a careful construction of weights, these works show that multiple layers of …
Large language models (LLMs) excel at few-shot in-context learning (ICL)--learning from a few examples provided in context at inference, without any weight updates. Newly expanded …
Machine unlearning, the study of efficiently removing the impact of specific training points on the trained model, has garnered increased attention of late, driven by the need to comply …
The sample selection approach is popular in learning with noisy labels. The state-of-the-art methods train two deep networks simultaneously for sample selection, which aims to employ …
N Suh, G Cheng - Annual Review of Statistics and Its Application, 2024 - annualreviews.org
In this article, we review the literature on statistical theories of neural networks from three perspectives: approximation, training dynamics, and generative models. In the first part …
Large language models have exhibited emergent abilities, demonstrating exceptional performance across diverse tasks for which they were not explicitly trained, including those …
LH Chen, J Zhang, Y Li, Y Pang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Human motion prediction is a classical problem in computer vision and computer graphics, which has a wide range of practical applications. Previous effects achieve great empirical …
Since its inception in" Attention Is All You Need", transformer architecture has led to revolutionary advancements in NLP. The attention layer within the transformer admits a …
Some autoregressive models exhibit in-context learning capabilities: being able to learn as an input sequence is processed, without undergoing any parameter changes, and without …