Repeat after me: Transformers are better than state space models at copying

Mamba-360: Survey of state space models as transformer alternative for long sequence modelling: Methods, applications, and challenges

BN Patro, VS Agneeswaran - arXiv preprint arXiv:2404.16112, 2024 - arxiv.org

Sequence modeling is a crucial area across various domains, including Natural Language
Processing (NLP), speech recognition, time series forecasting, music generation, and …

被引用次数：12 相关文章

[PDF] arxiv.org

Griffin: Mixing gated linear recurrences with local attention for efficient language models

S De, SL Smith, A Fernando, A Botev… - arXiv preprint arXiv …, 2024 - arxiv.org

Recurrent neural networks (RNNs) have fast inference and scale efficiently on long
sequences, but they are difficult to train and hard to scale. We propose Hawk, an RNN with …

被引用次数：34 相关文章所有 2 个版本

[PDF] arxiv.org

Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality

T Dao, A Gu - arXiv preprint arXiv:2405.21060, 2024 - arxiv.org

While Transformers have been the main architecture behind deep learning's success in
language modeling, state-space models (SSMs) such as Mamba have recently been shown …

被引用次数：28 相关文章所有 3 个版本

[PDF] openreview.net

Universality of Linear Recurrences Followed by Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues

A Orvieto, S De, C Gulcehre, R Pascanu… - Forty-first International …, 2024 - openreview.net

Deep neural networks based on linear RNNs interleaved with position-wise MLPs are
gaining traction as competitive approaches for sequence modeling. Examples of such …

被引用次数：1 相关文章

[PDF] arxiv.org

Zamba: A Compact 7B SSM Hybrid Model

P Glorioso, Q Anthony, Y Tokpanov… - arXiv preprint arXiv …, 2024 - arxiv.org

In this technical report, we present Zamba, a novel 7B SSM-transformer hybrid model which
achieves competitive performance against leading open-weight models at a comparable …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

Memllm: Finetuning llms to use an explicit read-write memory

A Modarressi, A Köksal, A Imani, M Fayyaz… - arXiv preprint arXiv …, 2024 - arxiv.org

While current large language models (LLMs) demonstrate some capabilities in knowledge-
intensive tasks, they are limited by relying on their parameters as an implicit storage …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Rnns are not transformers (yet): The key bottleneck on in-context retrieval

K Wen, X Dang, K Lyu - arXiv preprint arXiv:2402.18510, 2024 - arxiv.org

This paper investigates the gap in representation powers of Recurrent Neural Networks
(RNNs) and Transformers in the context of solving algorithmic problems. We focus on …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

Separations in the Representational Capabilities of Transformers and Recurrent Architectures

S Bhattamishra, M Hahn, P Blunsom… - arXiv preprint arXiv …, 2024 - arxiv.org

Transformer architectures have been widely adopted in foundation models. Due to their high
inference costs, there is renewed interest in exploring the potential of efficient recurrent …

Linear Transformers with Learnable Kernel Functions are Better In-Context Models

Y Aksenov, N Balagansky, SMLC Vaina… - arXiv preprint arXiv …, 2024 - arxiv.org

Advancing the frontier of subquadratic architectures for Language Models (LMs) is crucial in
the rapidly evolving field of natural language processing. Current innovations, including …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

An Empirical Study of Mamba-based Language Models

R Waleffe, W Byeon, D Riach, B Norick… - arXiv preprint arXiv …, 2024 - arxiv.org

Selective state-space models (SSMs) like Mamba overcome some of the shortcomings of
Transformers, such as quadratic computational complexity with sequence length and large …

被引用次数：4 相关文章所有 2 个版本

高级搜索

QQ 群