相关文章- 学术资源搜索

On the parameterization and initialization of diagonal state space models

A Gu, K Goel, A Gupta, C Ré - Advances in Neural …, 2022 - proceedings.neurips.cc

State space models (SSM) have recently been shown to be very effective as a deep learning
layer as a promising alternative to sequence models such as RNNs, CNNs, or Transformers …

被引用次数：182 相关文章所有 8 个版本

[PDF] arxiv.org

Efficiently modeling long sequences with structured state spaces

A Gu, K Goel, C Ré - arXiv preprint arXiv:2111.00396, 2021 - arxiv.org

A central goal of sequence modeling is designing a single principled model that can
address sequence data across a range of modalities and tasks, particularly on long-range …

被引用次数：847 相关文章所有 3 个版本

[PDF] arxiv.org

How to train your hippo: State space models with generalized orthogonal basis projections

A Gu, I Johnson, A Timalsina, A Rudra, C Ré - arXiv preprint arXiv …, 2022 - arxiv.org

Linear time-invariant state space models (SSM) are a classical model from engineering and
statistics, that have recently been shown to be very promising in machine learning through …

被引用次数：61 相关文章所有 4 个版本

[PDF] arxiv.org

Mamba: Linear-time sequence modeling with selective state spaces

A Gu, T Dao - arXiv preprint arXiv:2312.00752, 2023 - arxiv.org

Foundation models, now powering most of the exciting applications in deep learning, are
almost universally based on the Transformer architecture and its core attention module …

被引用次数：649 相关文章所有 7 个版本

[PDF] arxiv.org

Mambamixer: Efficient selective state space models with dual token and channel selection

A Behrouz, M Santacatterina, R Zabih - arXiv preprint arXiv:2403.19888, 2024 - arxiv.org

Recent advances in deep learning have mainly relied on Transformers due to their data
dependency and ability to learn at scale. The attention module in these architectures …

被引用次数：13 相关文章所有 2 个版本

[PDF] arxiv.org

Simplified state space layers for sequence modeling

JTH Smith, A Warrington, SW Linderman - arXiv preprint arXiv:2208.04933, 2022 - arxiv.org

Models using structured state space sequence (S4) layers have achieved state-of-the-art
performance on long-range sequence modeling tasks. An S4 layer combines linear state …

被引用次数：254 相关文章所有 3 个版本

[PDF] arxiv.org

The hidden attention of mamba models

A Ali, I Zimerman, L Wolf - arXiv preprint arXiv:2403.01590, 2024 - arxiv.org

The Mamba layer offers an efficient selective state space model (SSM) that is highly effective
in modeling multiple domains including NLP, long-range sequences processing, and …

被引用次数：17 相关文章所有 3 个版本

[PDF] neurips.cc

Hierarchically gated recurrent neural network for sequence modeling

Z Qin, S Yang, Y Zhong - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Transformers have surpassed RNNs in popularity due to their superior abilities in parallel
training and long-term dependency modeling. Recently, there has been a renewed interest …

被引用次数：39 相关文章所有 5 个版本

[PDF] neurips.cc

Block-state transformers

J Pilault, M Fathi, O Firat, C Pal… - Advances in Neural …, 2024 - proceedings.neurips.cc

State space models (SSMs) have shown impressive results on tasks that require modeling
long-range dependencies and efficiently scale to long sequences owing to their …

被引用次数：10 相关文章所有 4 个版本

[PDF] neurips.cc

Latent matters: Learning deep state-space models

A Klushyn, R Kurle, M Soelch… - Advances in …, 2021 - proceedings.neurips.cc

Deep state-space models (DSSMs) enable temporal predictions by learning the underlying
dynamics of observed sequence data. They are often trained by maximising the evidence …

被引用次数：37 相关文章所有 5 个版本

高级搜索

QQ 群

On the parameterization and initialization of diagonal state space models

Efficiently modeling long sequences with structured state spaces

How to train your hippo: State space models with generalized orthogonal basis projections

Mamba: Linear-time sequence modeling with selective state spaces

Mambamixer: Efficient selective state space models with dual token and channel selection

Simplified state space layers for sequence modeling

The hidden attention of mamba models

Hierarchically gated recurrent neural network for sequence modeling

Block-state transformers

Latent matters: Learning deep state-space models

引用