- 学术资源搜索

Foundational models defining a new era in vision: A survey and outlook

M Awais, M Naseer, S Khan, RM Anwer… - arXiv preprint arXiv …, 2023 - arxiv.org

Vision systems to see and reason about the compositional nature of visual scenes are
fundamental to understanding our world. The complex relations between objects and their …

被引用次数：75 相关文章所有 2 个版本

[PDF] mdpi.com

A survey on visual mamba

H Zhang, Y Zhu, D Wang, L Zhang, T Chen, Z Wang… - Applied Sciences, 2024 - mdpi.com

State space models (SSM) with selection mechanisms and hardware-aware architectures,
namely Mamba, have recently shown significant potential in long-sequence modeling. Since …

被引用次数：12 相关文章所有 2 个版本

[PDF] arxiv.org

A survey of large language models

WX Zhao, K Zhou, J Li, T Tang, X Wang, Y Hou… - arXiv preprint arXiv …, 2023 - arxiv.org

Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …

被引用次数：2250 相关文章所有 4 个版本

[PDF] arxiv.org

Mamba: Linear-time sequence modeling with selective state spaces

A Gu, T Dao - arXiv preprint arXiv:2312.00752, 2023 - arxiv.org

Foundation models, now powering most of the exciting applications in deep learning, are
almost universally based on the Transformer architecture and its core attention module …

被引用次数：837 相关文章所有 7 个版本

[PDF] arxiv.org

Vision mamba: Efficient visual representation learning with bidirectional state space model

L Zhu, B Liao, Q Zhang, X Wang, W Liu… - arXiv preprint arXiv …, 2024 - arxiv.org

Recently the state space models (SSMs) with efficient hardware-aware designs, ie, the
Mamba deep learning model, have shown great potential for long sequence modeling …

被引用次数：413 相关文章所有 5 个版本

[PDF] thecvf.com

Rmt: Retentive networks meet vision transformers

Q Fan, H Huang, M Chen, H Liu… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Abstract Vision Transformer (ViT) has gained increasing attention in the computer vision
community in recent years. However the core component of ViT Self-Attention lacks explicit …

被引用次数：28 相关文章所有 3 个版本

[PDF] arxiv.org

Scaling transformer to 1m tokens and beyond with rmt

A Bulatov, Y Kuratov, Y Kapushev… - arXiv preprint arXiv …, 2023 - arxiv.org

A major limitation for the broader scope of problems solvable by transformers is the
quadratic scaling of computational complexity with input size. In this study, we investigate …

被引用次数：63 相关文章所有 3 个版本

[PDF] arxiv.org

Videomamba: State space model for efficient video understanding

K Li, X Li, Y Wang, Y He, Y Wang, L Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

Addressing the dual challenges of local redundancy and global dependencies in video
understanding, this work innovatively adapts the Mamba to the video domain. The proposed …

被引用次数：68 相关文章所有 2 个版本

[PDF] arxiv.org

Gated linear attention transformers with hardware-efficient training

S Yang, B Wang, Y Shen, R Panda, Y Kim - arXiv preprint arXiv …, 2023 - arxiv.org

Transformers with linear attention allow for efficient parallel training but can simultaneously
be formulated as an RNN with 2D (matrix-valued) hidden states, thus enjoying linear (with …

被引用次数：46 相关文章所有 4 个版本

[PDF] arxiv.org

Towards graph foundation models: A survey and beyond

J Liu, C Yang, Z Lu, J Chen, Y Li, M Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

Emerging as fundamental building blocks for diverse artificial intelligence applications,
foundation models have achieved notable success across natural language processing and …

被引用次数：57 相关文章所有 4 个版本

高级搜索

QQ 群

Foundational models defining a new era in vision: A survey and outlook

A survey on visual mamba

A survey of large language models

Mamba: Linear-time sequence modeling with selective state spaces

Vision mamba: Efficient visual representation learning with bidirectional state space model

Rmt: Retentive networks meet vision transformers

Scaling transformer to 1m tokens and beyond with rmt

Videomamba: State space model for efficient video understanding

Gated linear attention transformers with hardware-efficient training

Towards graph foundation models: A survey and beyond

引用