Recurrent neural networks (RNNs) have fast inference and scale efficiently on long sequences, but they are difficult to train and hard to scale. We propose Hawk, an RNN with …
T Dao, A Gu - arXiv preprint arXiv:2405.21060, 2024 - arxiv.org
While Transformers have been the main architecture behind deep learning's success in language modeling, state-space models (SSMs) such as Mamba have recently been shown …
Deep neural networks based on linear RNNs interleaved with position-wise MLPs are gaining traction as competitive approaches for sequence modeling. Examples of such …
P Glorioso, Q Anthony, Y Tokpanov… - arXiv preprint arXiv …, 2024 - arxiv.org
In this technical report, we present Zamba, a novel 7B SSM-transformer hybrid model which achieves competitive performance against leading open-weight models at a comparable …
While current large language models (LLMs) demonstrate some capabilities in knowledge- intensive tasks, they are limited by relying on their parameters as an implicit storage …
K Wen, X Dang, K Lyu - arXiv preprint arXiv:2402.18510, 2024 - arxiv.org
This paper investigates the gap in representation powers of Recurrent Neural Networks (RNNs) and Transformers in the context of solving algorithmic problems. We focus on …
Transformer architectures have been widely adopted in foundation models. Due to their high inference costs, there is renewed interest in exploring the potential of efficient recurrent …
Advancing the frontier of subquadratic architectures for Language Models (LMs) is crucial in the rapidly evolving field of natural language processing. Current innovations, including …
R Waleffe, W Byeon, D Riach, B Norick… - arXiv preprint arXiv …, 2024 - arxiv.org
Selective state-space models (SSMs) like Mamba overcome some of the shortcomings of Transformers, such as quadratic computational complexity with sequence length and large …