Flexconv: Continuous kernel convolutions with differentiable kernel sizes

M Poli, S Massaroli, E Nguyen, DY Fu… - International …, 2023 - proceedings.mlr.press

Recent advances in deep learning have relied heavily on the use of large Transformers due
to their ability to learn at scale. However, the core building block of Transformers, the …

被引用次数：190 相关文章所有 6 个版本

[PDF] neurips.cc

Hyenadna: Long-range genomic sequence modeling at single nucleotide resolution

E Nguyen, M Poli, M Faizi, A Thomas… - Advances in neural …, 2024 - proceedings.neurips.cc

Genomic (DNA) sequences encode an enormous amount of information for gene regulation
and protein synthesis. Similar to natural language models, researchers have proposed …

被引用次数：115 相关文章所有 8 个版本

[PDF] thecvf.com

Scaling up your kernels to 31x31: Revisiting large kernel design in cnns

X Ding, X Zhang, J Han, G Ding - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

We revisit large kernel design in modern convolutional neural networks (CNNs). Inspired by
recent advances in vision transformers (ViTs), in this paper, we demonstrate that using a few …

被引用次数：812 相关文章所有 10 个版本

[PDF] neurips.cc

On the parameterization and initialization of diagonal state space models

A Gu, K Goel, A Gupta, C Ré - Advances in Neural …, 2022 - proceedings.neurips.cc

State space models (SSM) have recently been shown to be very effective as a deep learning
layer as a promising alternative to sequence models such as RNNs, CNNs, or Transformers …

被引用次数：185 相关文章所有 8 个版本

[PDF] arxiv.org

Efficiently modeling long sequences with structured state spaces

A Gu, K Goel, C Ré - arXiv preprint arXiv:2111.00396, 2021 - arxiv.org

A central goal of sequence modeling is designing a single principled model that can
address sequence data across a range of modalities and tasks, particularly on long-range …

被引用次数：876 相关文章所有 3 个版本

[PDF] neurips.cc

S4nd: Modeling images and videos as multidimensional signals with state spaces

E Nguyen, K Goel, A Gu, G Downs… - Advances in neural …, 2022 - proceedings.neurips.cc

Visual data such as images and videos are typically modeled as discretizations of inherently
continuous, multidimensional signals. Existing continuous-signal models attempt to exploit …

被引用次数：124 相关文章所有 6 个版本

[PDF] arxiv.org

Simplified state space layers for sequence modeling

JTH Smith, A Warrington, SW Linderman - arXiv preprint arXiv:2208.04933, 2022 - arxiv.org

Models using structured state space sequence (S4) layers have achieved state-of-the-art
performance on long-range sequence modeling tasks. An S4 layer combines linear state …

被引用次数：259 相关文章所有 3 个版本

[PDF] neurips.cc

Combining recurrent, convolutional, and continuous-time models with linear state space layers

A Gu, I Johnson, K Goel, K Saab… - Advances in neural …, 2021 - proceedings.neurips.cc

Recurrent neural networks (RNNs), temporal convolutions, and neural differential equations
(NDEs) are popular families of deep learning models for time-series data, each with unique …

被引用次数：292 相关文章所有 8 个版本

[PDF] arxiv.org

Mega: moving average equipped gated attention

X Ma, C Zhou, X Kong, J He, L Gui, G Neubig… - arXiv preprint arXiv …, 2022 - arxiv.org

The design choices in the Transformer attention mechanism, including weak inductive bias
and quadratic computational complexity, have limited its application for modeling long …

被引用次数：109 相关文章所有 3 个版本

[PDF] neurips.cc

Monarch mixer: A simple sub-quadratic gemm-based architecture

D Fu, S Arora, J Grogan, I Johnson… - Advances in …, 2024 - proceedings.neurips.cc

Abstract Machine learning models are increasingly being scaled in both sequence length
and model dimension to reach longer contexts and better performance. However, existing …

被引用次数：30 相关文章所有 6 个版本

高级搜索

QQ 群