On the parameterization and initialization of diagonal state space models

A Gu, K Goel, A Gupta, C Ré - Advances in Neural …, 2022 - proceedings.neurips.cc
State space models (SSM) have recently been shown to be very effective as a deep learning
layer as a promising alternative to sequence models such as RNNs, CNNs, or Transformers …

Efficiently modeling long sequences with structured state spaces

A Gu, K Goel, C Ré - arXiv preprint arXiv:2111.00396, 2021 - arxiv.org
A central goal of sequence modeling is designing a single principled model that can
address sequence data across a range of modalities and tasks, particularly on long-range …

How to train your hippo: State space models with generalized orthogonal basis projections

A Gu, I Johnson, A Timalsina, A Rudra, C Ré - arXiv preprint arXiv …, 2022 - arxiv.org
Linear time-invariant state space models (SSM) are a classical model from engineering and
statistics, that have recently been shown to be very promising in machine learning through …

Mamba: Linear-time sequence modeling with selective state spaces

A Gu, T Dao - arXiv preprint arXiv:2312.00752, 2023 - arxiv.org
Foundation models, now powering most of the exciting applications in deep learning, are
almost universally based on the Transformer architecture and its core attention module …

Mambamixer: Efficient selective state space models with dual token and channel selection

A Behrouz, M Santacatterina, R Zabih - arXiv preprint arXiv:2403.19888, 2024 - arxiv.org
Recent advances in deep learning have mainly relied on Transformers due to their data
dependency and ability to learn at scale. The attention module in these architectures …

Simplified state space layers for sequence modeling

JTH Smith, A Warrington, SW Linderman - arXiv preprint arXiv:2208.04933, 2022 - arxiv.org
Models using structured state space sequence (S4) layers have achieved state-of-the-art
performance on long-range sequence modeling tasks. An S4 layer combines linear state …

The hidden attention of mamba models

A Ali, I Zimerman, L Wolf - arXiv preprint arXiv:2403.01590, 2024 - arxiv.org
The Mamba layer offers an efficient selective state space model (SSM) that is highly effective
in modeling multiple domains including NLP, long-range sequences processing, and …

Hierarchically gated recurrent neural network for sequence modeling

Z Qin, S Yang, Y Zhong - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Transformers have surpassed RNNs in popularity due to their superior abilities in parallel
training and long-term dependency modeling. Recently, there has been a renewed interest …

Block-state transformers

J Pilault, M Fathi, O Firat, C Pal… - Advances in Neural …, 2024 - proceedings.neurips.cc
State space models (SSMs) have shown impressive results on tasks that require modeling
long-range dependencies and efficiently scale to long sequences owing to their …

Latent matters: Learning deep state-space models

A Klushyn, R Kurle, M Soelch… - Advances in …, 2021 - proceedings.neurips.cc
Deep state-space models (DSSMs) enable temporal predictions by learning the underlying
dynamics of observed sequence data. They are often trained by maximising the evidence …