A survey of mamba

H Qu, L Ning, R An, W Fan, T Derr, H Liu, X Xu… - arXiv preprint arXiv …, 2024 - arxiv.org
As one of the most representative DL techniques, Transformer architecture has empowered
numerous advanced models, especially the large language models (LLMs) that comprise …

CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset

X Wang, F Wang, Y Li, Q Ma, S Wang, B Jiang… - arXiv preprint arXiv …, 2024 - arxiv.org
X-ray image-based medical report generation (MRG) is a pivotal area in artificial intelligence
which can significantly reduce diagnostic burdens and patient wait times. Despite significant …

A Survey on Vision Autoregressive Model

K Jiang, J Huang - arXiv preprint arXiv:2411.08666, 2024 - arxiv.org
Autoregressive models have demonstrated great performance in natural language
processing (NLP) with impressive scalability, adaptability and generalizability. Inspired by …

Causal Image Modeling for Efficient Visual Understanding

F Wang, T Yang, Y Yu, S Ren, G Wei, A Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
In this work, we present a comprehensive analysis of causal image modeling and introduce
the Adventurer series models where we treat images as sequences of patch tokens and …

Surface Vision Mamba: Leveraging Bidirectional State Space Model for Efficient Spherical Manifold Representation

R He, W Zheng - arXiv preprint arXiv:2501.14679, 2025 - arxiv.org
Attention-based methods have demonstrated exceptional performance in modelling long-
range dependencies on spherical cortical surfaces, surpassing traditional Geometric Deep …

MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining

Y Liu, L Yi - arXiv preprint arXiv:2410.00871, 2024 - arxiv.org
Mamba has achieved significant advantages in long-context modeling and autoregressive
tasks, but its scalability with large parameters remains a major limitation in vision …