Resurrecting recurrent neural networks for long sequences

WX Zhao, K Zhou, J Li, T Tang, X Wang, Y Hou… - arXiv preprint arXiv …, 2023 - arxiv.org

Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …

被引用次数：2021 相关文章所有 4 个版本

[PDF] arxiv.org

Mamba: Linear-time sequence modeling with selective state spaces

A Gu, T Dao - arXiv preprint arXiv:2312.00752, 2023 - arxiv.org

Foundation models, now powering most of the exciting applications in deep learning, are
almost universally based on the Transformer architecture and its core attention module …

被引用次数：649 相关文章所有 7 个版本

[PDF] arxiv.org

Rwkv: Reinventing rnns for the transformer era

B Peng, E Alcaide, Q Anthony, A Albalak… - arXiv preprint arXiv …, 2023 - arxiv.org

Transformers have revolutionized almost all natural language processing (NLP) tasks but
suffer from memory and computational complexity that scales quadratically with sequence …

被引用次数：248 相关文章所有 9 个版本

[PDF] arxiv.org

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

M Reid, N Savinov, D Teplyashin, D Lepikhin… - arXiv preprint arXiv …, 2024 - arxiv.org

In this report, we present the latest model of the Gemini family, Gemini 1.5 Pro, a highly
compute-efficient multimodal mixture-of-experts model capable of recalling and reasoning …

被引用次数：216 相关文章所有 4 个版本

[PDF] neurips.cc

Protein design with guided discrete diffusion

N Gruver, S Stanton, N Frey… - Advances in neural …, 2024 - proceedings.neurips.cc

A popular approach to protein design is to combine a generative model with a discriminative
model for conditional sampling. The generative model samples plausible sequences while …

被引用次数：60 相关文章所有 6 个版本

[PDF] neurips.cc

Hierarchically gated recurrent neural network for sequence modeling

Z Qin, S Yang, Y Zhong - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Transformers have surpassed RNNs in popularity due to their superior abilities in parallel
training and long-term dependency modeling. Recently, there has been a renewed interest …

被引用次数：39 相关文章所有 5 个版本

[PDF] arxiv.org

A survey on vision mamba: Models, applications and challenges

R Xu, S Yang, Y Wang, B Du, H Chen - arXiv preprint arXiv:2404.18861, 2024 - arxiv.org

Mamba, a recent selective structured state space model, performs excellently on long
sequence modeling tasks. Mamba mitigates the modeling constraints of convolutional …

被引用次数：16 相关文章所有 2 个版本

[PDF] arxiv.org

Mamba-360: Survey of state space models as transformer alternative for long sequence modelling: Methods, applications, and challenges

BN Patro, VS Agneeswaran - arXiv preprint arXiv:2404.16112, 2024 - arxiv.org

Sequence modeling is a crucial area across various domains, including Natural Language
Processing (NLP), speech recognition, time series forecasting, music generation, and …

被引用次数：12 相关文章

[PDF] neurips.cc

Laughing hyena distillery: Extracting compact recurrences from convolutions

S Massaroli, M Poli, D Fu, H Kumbong… - Advances in …, 2024 - proceedings.neurips.cc

Recent advances in attention-free sequence models rely on convolutions as alternatives to
the attention operator at the core of Transformers. In particular, long convolution sequence …

被引用次数：12 相关文章所有 6 个版本

[PDF] neurips.cc

Exposing attention glitches with flip-flop language modeling

B Liu, J Ash, S Goel… - Advances in Neural …, 2024 - proceedings.neurips.cc

Why do large language models sometimes output factual inaccuracies and exhibit
erroneous reasoning? The brittleness of these models, particularly when executing long …

被引用次数：31 相关文章所有 7 个版本

高级搜索

QQ 群