Can neural networks do arithmetic? a survey on the elementary numerical skills of state-of-the-art deep learning models

A Testolin - Applied Sciences, 2024 - mdpi.com
Creating learning models that can exhibit sophisticated reasoning abilities is one of the
greatest challenges in deep learning research, and mathematics is rapidly becoming one of …

Amortizing intractable inference in large language models

EJ Hu, M Jain, E Elmoznino, Y Kaddar, G Lajoie… - arXiv preprint arXiv …, 2023 - arxiv.org
Autoregressive large language models (LLMs) compress knowledge from their training data
through next-token conditional distributions. This limits tractable querying of this knowledge …

Repeat after me: Transformers are better than state space models at copying

S Jelassi, D Brandfonbrener, SM Kakade… - arXiv preprint arXiv …, 2024 - arxiv.org
Transformers are the dominant architecture for sequence modeling, but there is growing
interest in models that use a fixed-size latent state that does not depend on the sequence …

Gpt can solve mathematical problems without a calculator

Z Yang, M Ding, Q Lv, Z Jiang, Z He, Y Guo… - arXiv preprint arXiv …, 2023 - arxiv.org
Previous studies have typically assumed that large language models are unable to
accurately perform arithmetic operations, particularly multiplication of> 8 digits, and …

Transformers can achieve length generalization but not robustly

Y Zhou, U Alon, X Chen, X Wang, R Agarwal… - arXiv preprint arXiv …, 2024 - arxiv.org
Length generalization, defined as the ability to extrapolate from shorter training sequences
to longer test ones, is a significant challenge for language models. This issue persists even …

Positional description matters for transformers arithmetic

R Shen, S Bubeck, R Eldan, YT Lee, Y Li… - arXiv preprint arXiv …, 2023 - arxiv.org
Transformers, central to the successes in modern Natural Language Processing, often falter
on arithmetic tasks despite their vast capabilities--which paradoxically include remarkable …

Snip: Bridging mathematical symbolic and numeric realms with unified pre-training

K Meidani, P Shojaee, CK Reddy… - arXiv preprint arXiv …, 2023 - arxiv.org
In an era where symbolic mathematical equations are indispensable for modeling complex
natural phenomena, scientific inquiry often involves collecting observations and translating …

Improving length-generalization in transformers via task hinting

P Awasthi, A Gupta - arXiv preprint arXiv:2310.00726, 2023 - arxiv.org
It has been observed in recent years that transformers have problems with length
generalization for certain types of reasoning and arithmetic tasks. In particular, the …

Transformers Can Do Arithmetic with the Right Embeddings

S McLeish, A Bansal, A Stein, N Jain… - arXiv preprint arXiv …, 2024 - arxiv.org
The poor performance of transformers on arithmetic tasks seems to stem in large part from
their inability to keep track of the exact position of each digit inside of a large span of digits …

Benchmarking gpt-4 on algorithmic problems: A systematic evaluation of prompting strategies

F Petruzzellis, A Testolin, A Sperduti - arXiv preprint arXiv:2402.17396, 2024 - arxiv.org
Large Language Models (LLMs) have revolutionized the field of Natural Language
Processing thanks to their ability to reuse knowledge acquired on massive text corpora on a …