Larger language models do in-context learning differently

J Wei, J Wei, Y Tay, D Tran, A Webson, Y Lu… - arXiv preprint arXiv …, 2023 - arxiv.org
We study how in-context learning (ICL) in language models is affected by semantic priors
versus input-label mappings. We investigate two setups-ICL with flipped labels and ICL with …

Provable guarantees for neural networks via gradient feature learning

Z Shi, J Wei, Y Liang - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Neural networks have achieved remarkable empirical performance, while the current
theoretical analysis is not adequate for understanding their success, eg, the Neural Tangent …

Is a picture worth a thousand words? delving into spatial reasoning for vision language models

J Wang, Y Ming, Z Shi, V Vineet, X Wang, Y Li… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) and vision-language models (VLMs) have demonstrated
remarkable performance across a wide range of tasks and domains. Despite this promise …

Unraveling the smoothness properties of diffusion models: A gaussian mixture perspective

Y Liang, Z Shi, Z Song, Y Zhou - arXiv preprint arXiv:2405.16418, 2024 - arxiv.org
Diffusion models have made rapid progress in generating high-quality samples across
various domains. However, a theoretical understanding of the Lipschitz continuity and …

Fourier circuits in neural networks: Unlocking the potential of large language models in mathematical reasoning and modular arithmetic

J Gu, C Li, Y Liang, Z Shi, Z Song… - arXiv preprint arXiv …, 2024 - openreview.net
In the evolving landscape of machine learning, a pivotal challenge lies in deciphering the
internal representations harnessed by neural networks and Transformers. Building on recent …

Towards Few-Shot Adaptation of Foundation Models via Multitask Finetuning

Z Xu, Z Shi, J Wei, F Mu, Y Li, Y Liang - arXiv preprint arXiv:2402.15017, 2024 - arxiv.org
Foundation models have emerged as a powerful tool for many AI problems. Despite the
tremendous success of foundation models, effective adaptation to new tasks, particularly …

Differential privacy mechanisms in neural tangent kernel regression

J Gu, Y Liang, Z Sha, Z Shi, Z Song - arXiv preprint arXiv:2407.13621, 2024 - arxiv.org
Training data privacy is a fundamental problem in modern Artificial Intelligence (AI)
applications, such as face recognition, recommendation systems, language generation, and …

Exploring the frontiers of softmax: Provable optimization, applications in diffusion model, and beyond

J Gu, C Li, Y Liang, Z Shi, Z Song - arXiv preprint arXiv:2405.03251, 2024 - arxiv.org
The softmax activation function plays a crucial role in the success of large language models
(LLMs), particularly in the self-attention mechanism of the widely adopted Transformer …

Do large language models have compositional ability? an investigation into limitations and scalability

Z Xu, Z Shi, Y Liang - ICLR 2024 Workshop on Mathematical and …, 2024 - openreview.net
Large language models (LLM) have emerged as a powerful tool exhibiting remarkable in-
context learning (ICL) capabilities. In this study, we delve into the ICL capabilities of LLMs on …

Adainf: Adaptive inference for resource-constrained foundation models

Z Xu, KD Nguyen, P Mukherjee, S Chaterji… - Workshop on Efficient …, 2024 - openreview.net
Foundation models have emerged as a powerful tool in AI, yet come with substantial
computational cost, limiting their deployment in resource-constraint devices. Several recent …