Length generalization in arithmetic transformers

Can neural networks do arithmetic? a survey on the elementary numerical skills of state-of-the-art deep learning models

A Testolin - Applied Sciences, 2024 - mdpi.com

Creating learning models that can exhibit sophisticated reasoning abilities is one of the
greatest challenges in deep learning research, and mathematics is rapidly becoming one of …

被引用次数：12 相关文章所有 5 个版本

[PDF] arxiv.org

Amortizing intractable inference in large language models

EJ Hu, M Jain, E Elmoznino, Y Kaddar, G Lajoie… - arXiv preprint arXiv …, 2023 - arxiv.org

Autoregressive large language models (LLMs) compress knowledge from their training data
through next-token conditional distributions. This limits tractable querying of this knowledge …

被引用次数：18 相关文章所有 3 个版本

[PDF] arxiv.org

Repeat after me: Transformers are better than state space models at copying

S Jelassi, D Brandfonbrener, SM Kakade… - arXiv preprint arXiv …, 2024 - arxiv.org

Transformers are the dominant architecture for sequence modeling, but there is growing
interest in models that use a fixed-size latent state that does not depend on the sequence …

被引用次数：23 相关文章所有 3 个版本

[PDF] arxiv.org

Gpt can solve mathematical problems without a calculator

Z Yang, M Ding, Q Lv, Z Jiang, Z He, Y Guo… - arXiv preprint arXiv …, 2023 - arxiv.org

Previous studies have typically assumed that large language models are unable to
accurately perform arithmetic operations, particularly multiplication of> 8 digits, and …

被引用次数：21 相关文章所有 3 个版本

[PDF] arxiv.org

Transformers can achieve length generalization but not robustly

Y Zhou, U Alon, X Chen, X Wang, R Agarwal… - arXiv preprint arXiv …, 2024 - arxiv.org

Length generalization, defined as the ability to extrapolate from shorter training sequences
to longer test ones, is a significant challenge for language models. This issue persists even …

被引用次数：9 相关文章所有 3 个版本

[PDF] arxiv.org

Positional description matters for transformers arithmetic

R Shen, S Bubeck, R Eldan, YT Lee, Y Li… - arXiv preprint arXiv …, 2023 - arxiv.org

Transformers, central to the successes in modern Natural Language Processing, often falter
on arithmetic tasks despite their vast capabilities--which paradoxically include remarkable …

被引用次数：15 相关文章所有 3 个版本

[PDF] arxiv.org

Snip: Bridging mathematical symbolic and numeric realms with unified pre-training

K Meidani, P Shojaee, CK Reddy… - arXiv preprint arXiv …, 2023 - arxiv.org

In an era where symbolic mathematical equations are indispensable for modeling complex
natural phenomena, scientific inquiry often involves collecting observations and translating …

被引用次数：6 相关文章所有 5 个版本

[PDF] arxiv.org

Improving length-generalization in transformers via task hinting

P Awasthi, A Gupta - arXiv preprint arXiv:2310.00726, 2023 - arxiv.org

It has been observed in recent years that transformers have problems with length
generalization for certain types of reasoning and arithmetic tasks. In particular, the …

被引用次数：6 相关文章所有 3 个版本

[PDF] arxiv.org

Transformers Can Do Arithmetic with the Right Embeddings

S McLeish, A Bansal, A Stein, N Jain… - arXiv preprint arXiv …, 2024 - arxiv.org

The poor performance of transformers on arithmetic tasks seems to stem in large part from
their inability to keep track of the exact position of each digit inside of a large span of digits …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

Benchmarking gpt-4 on algorithmic problems: A systematic evaluation of prompting strategies

F Petruzzellis, A Testolin, A Sperduti - arXiv preprint arXiv:2402.17396, 2024 - arxiv.org

Large Language Models (LLMs) have revolutionized the field of Natural Language
Processing thanks to their ability to reuse knowledge acquired on massive text corpora on a …

被引用次数：3 相关文章所有 3 个版本

高级搜索

QQ 群