Liquid structural state-space models

A Gu, T Dao - arXiv preprint arXiv:2312.00752, 2023 - arxiv.org

Foundation models, now powering most of the exciting applications in deep learning, are
almost universally based on the Transformer architecture and its core attention module …

被引用次数：649 相关文章所有 7 个版本

[PDF] arxiv.org

Vision mamba: Efficient visual representation learning with bidirectional state space model

L Zhu, B Liao, Q Zhang, X Wang, W Liu… - arXiv preprint arXiv …, 2024 - arxiv.org

Recently the state space models (SSMs) with efficient hardware-aware designs, ie, the
Mamba deep learning model, have shown great potential for long sequence modeling …

被引用次数：315 相关文章所有 5 个版本

[PDF] mlr.press

Resurrecting recurrent neural networks for long sequences

A Orvieto, SL Smith, A Gu, A Fernando… - International …, 2023 - proceedings.mlr.press

Abstract Recurrent Neural Networks (RNNs) offer fast inference on long sequences but are
hard to optimize and slow to train. Deep state-space models (SSMs) have recently been …

被引用次数：158 相关文章所有 7 个版本

[PDF] arxiv.org

Simplified state space layers for sequence modeling

JTH Smith, A Warrington, SW Linderman - arXiv preprint arXiv:2208.04933, 2022 - arxiv.org

Models using structured state space sequence (S4) layers have achieved state-of-the-art
performance on long-range sequence modeling tasks. An S4 layer combines linear state …

被引用次数：254 相关文章所有 3 个版本

[PDF] neurips.cc

Hierarchically gated recurrent neural network for sequence modeling

Z Qin, S Yang, Y Zhong - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Transformers have surpassed RNNs in popularity due to their superior abilities in parallel
training and long-term dependency modeling. Recently, there has been a renewed interest …

被引用次数：39 相关文章所有 5 个版本

[PDF] arxiv.org

A survey on vision mamba: Models, applications and challenges

R Xu, S Yang, Y Wang, B Du, H Chen - arXiv preprint arXiv:2404.18861, 2024 - arxiv.org

Mamba, a recent selective structured state space model, performs excellently on long
sequence modeling tasks. Mamba mitigates the modeling constraints of convolutional …

被引用次数：16 相关文章所有 2 个版本

[PDF] neurips.cc

Monarch mixer: A simple sub-quadratic gemm-based architecture

D Fu, S Arora, J Grogan, I Johnson… - Advances in …, 2024 - proceedings.neurips.cc

Abstract Machine learning models are increasingly being scaled in both sequence length
and model dimension to reach longer contexts and better performance. However, existing …

被引用次数：28 相关文章所有 6 个版本

[PDF] arxiv.org

Mamba-360: Survey of state space models as transformer alternative for long sequence modelling: Methods, applications, and challenges

BN Patro, VS Agneeswaran - arXiv preprint arXiv:2404.16112, 2024 - arxiv.org

Sequence modeling is a crucial area across various domains, including Natural Language
Processing (NLP), speech recognition, time series forecasting, music generation, and …

被引用次数：12 相关文章

[PDF] mlr.press

Simple hardware-efficient long convolutions for sequence modeling

DY Fu, EL Epstein, E Nguyen… - International …, 2023 - proceedings.mlr.press

State space models (SSMs) have high performance on long sequence modeling but require
sophisticated initialization techniques and specialized implementations for high quality and …

被引用次数：42 相关文章所有 8 个版本

[PDF] arxiv.org

Gated linear attention transformers with hardware-efficient training

S Yang, B Wang, Y Shen, R Panda, Y Kim - arXiv preprint arXiv …, 2023 - arxiv.org

Transformers with linear attention allow for efficient parallel training but can simultaneously
be formulated as an RNN with 2D (matrix-valued) hidden states, thus enjoying linear (with …

被引用次数：44 相关文章所有 4 个版本

高级搜索

QQ 群