- 学术资源搜索

Larger language models do in-context learning differently

J Wei, J Wei, Y Tay, D Tran, A Webson, Y Lu… - arXiv preprint arXiv …, 2023 - arxiv.org

We study how in-context learning (ICL) in language models is affected by semantic priors
versus input-label mappings. We investigate two setups-ICL with flipped labels and ICL with …

被引用次数：296 相关文章所有 7 个版本

[PDF] neurips.cc

Provable guarantees for neural networks via gradient feature learning

Z Shi, J Wei, Y Liang - Advances in Neural Information …, 2023 - proceedings.neurips.cc

Neural networks have achieved remarkable empirical performance, while the current
theoretical analysis is not adequate for understanding their success, eg, the Neural Tangent …

被引用次数：15 相关文章所有 6 个版本

[PDF] arxiv.org

Is a picture worth a thousand words? delving into spatial reasoning for vision language models

J Wang, Y Ming, Z Shi, V Vineet, X Wang, Y Li… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) and vision-language models (VLMs) have demonstrated
remarkable performance across a wide range of tasks and domains. Despite this promise …

被引用次数：14 相关文章所有 5 个版本

[PDF] arxiv.org

Unraveling the smoothness properties of diffusion models: A gaussian mixture perspective

Y Liang, Z Shi, Z Song, Y Zhou - arXiv preprint arXiv:2405.16418, 2024 - arxiv.org

Diffusion models have made rapid progress in generating high-quality samples across
various domains. However, a theoretical understanding of the Lipschitz continuity and …

被引用次数：16 相关文章所有 2 个版本

[PDF] openreview.net

Fourier circuits in neural networks: Unlocking the potential of large language models in mathematical reasoning and modular arithmetic

J Gu, C Li, Y Liang, Z Shi, Z Song… - arXiv preprint arXiv …, 2024 - openreview.net

In the evolving landscape of machine learning, a pivotal challenge lies in deciphering the
internal representations harnessed by neural networks and Transformers. Building on recent …

被引用次数：17 相关文章所有 4 个版本

[PDF] arxiv.org

Towards Few-Shot Adaptation of Foundation Models via Multitask Finetuning

Z Xu, Z Shi, J Wei, F Mu, Y Li, Y Liang - arXiv preprint arXiv:2402.15017, 2024 - arxiv.org

Foundation models have emerged as a powerful tool for many AI problems. Despite the
tremendous success of foundation models, effective adaptation to new tasks, particularly …

被引用次数：25 相关文章所有 4 个版本

[PDF] arxiv.org

Differential privacy mechanisms in neural tangent kernel regression

J Gu, Y Liang, Z Sha, Z Shi, Z Song - arXiv preprint arXiv:2407.13621, 2024 - arxiv.org

Training data privacy is a fundamental problem in modern Artificial Intelligence (AI)
applications, such as face recognition, recommendation systems, language generation, and …

被引用次数：6 相关文章所有 3 个版本

[PDF] arxiv.org

Exploring the frontiers of softmax: Provable optimization, applications in diffusion model, and beyond

J Gu, C Li, Y Liang, Z Shi, Z Song - arXiv preprint arXiv:2405.03251, 2024 - arxiv.org

The softmax activation function plays a crucial role in the success of large language models
(LLMs), particularly in the self-attention mechanism of the widely adopted Transformer …

被引用次数：14 相关文章所有 2 个版本

[PDF] openreview.net

Do large language models have compositional ability? an investigation into limitations and scalability

Z Xu, Z Shi, Y Liang - ICLR 2024 Workshop on Mathematical and …, 2024 - openreview.net

Large language models (LLM) have emerged as a powerful tool exhibiting remarkable in-
context learning (ICL) capabilities. In this study, we delve into the ICL capabilities of LLMs on …

被引用次数：33 相关文章

[PDF] openreview.net

Adainf: Adaptive inference for resource-constrained foundation models

Z Xu, KD Nguyen, P Mukherjee, S Chaterji… - Workshop on Efficient …, 2024 - openreview.net

Foundation models have emerged as a powerful tool in AI, yet come with substantial
computational cost, limiting their deployment in resource-constraint devices. Several recent …

被引用次数：3 相关文章

高级搜索

QQ 群

Larger language models do in-context learning differently

Provable guarantees for neural networks via gradient feature learning

Is a picture worth a thousand words? delving into spatial reasoning for vision language models

Unraveling the smoothness properties of diffusion models: A gaussian mixture perspective

Fourier circuits in neural networks: Unlocking the potential of large language models in mathematical reasoning and modular arithmetic

Towards Few-Shot Adaptation of Foundation Models via Multitask Finetuning

Differential privacy mechanisms in neural tangent kernel regression

Exploring the frontiers of softmax: Provable optimization, applications in diffusion model, and beyond

Do large language models have compositional ability? an investigation into limitations and scalability

Adainf: Adaptive inference for resource-constrained foundation models

引用