Anchor function: a type of benchmark functions for studying language models

文章

学术资源搜索

获得 4 条结果（用时0.02秒）

我的图书馆

Anchor function: a type of benchmark functions for studying language models

在引用文章中搜索

[PDF] arxiv.org

Towards understanding how transformer perform multi-step reasoning with matching operation

Z Wang, Y Wang, Z Zhang, Z Zhou, H Jin, T Hu… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models have consistently struggled with complex reasoning tasks, such as
mathematical problem-solving. Investigating the internal reasoning mechanisms of these …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

Initialization is Critical to Whether Transformers Fit Composite Functions by Inference or Memorizing

Z Zhang, P Lin, Z Wang, Y Zhang, ZQJ Xu - arXiv preprint arXiv …, 2024 - arxiv.org

Transformers have shown impressive capabilities across various tasks, but their
performance on compositional problems remains a topic of debate. In this work, we …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

Anchor Attention, Small Cache: Code Generation with Large Language Models

X Zhang, Y Zhou, G Yang, HC Gall, T Chen - arXiv preprint arXiv …, 2024 - arxiv.org

The development of large language models (LLMs) has revolutionized automated code
generation. However, their high demand of computation resources has hindered a broader …

Understanding the Expressive Power and Mechanisms of Transformer for Sequence Modeling

M Wang - arXiv preprint arXiv:2402.00522, 2024 - arxiv.org

We conduct a systematic study of the approximation properties of Transformer for sequence
modeling with long, sparse and complicated memory. We investigate the mechanisms …

被引用次数：3 相关文章所有 3 个版本

高级搜索

QQ 群

Anchor function: a type of benchmark functions for studying language models

Towards understanding how transformer perform multi-step reasoning with matching operation

Initialization is Critical to Whether Transformers Fit Composite Functions by Inference or Memorizing

Anchor Attention, Small Cache: Code Generation with Large Language Models

Understanding the Expressive Power and Mechanisms of Transformer for Sequence Modeling

引用