- 学术资源搜索

Feature dimensionality reduction: a review

W Jia, M Sun, J Lian, S Hou - Complex & Intelligent Systems, 2022 - Springer

As basic research, it has also received increasing attention from people that the “curse of
dimensionality” will lead to increase the cost of data storage and computing; it also …

被引用次数：366 相关文章所有 4 个版本

[PDF] springer.com

Large-scale multi-modal pre-trained models: A comprehensive survey

X Wang, G Chen, G Qian, P Gao, XY Wei… - Machine Intelligence …, 2023 - Springer

With the urgent demand for generalized deep models, many pre-trained big models are
proposed, such as bidirectional encoder representations (BERT), vision transformer (ViT) …

被引用次数：137 相关文章所有 8 个版本

[PDF] thecvf.com

Internimage: Exploring large-scale vision foundation models with deformable convolutions

W Wang, J Dai, Z Chen, Z Huang, Z Li… - Proceedings of the …, 2023 - openaccess.thecvf.com

Compared to the great progress of large-scale vision transformers (ViTs) in recent years,
large-scale models based on convolutional neural networks (CNNs) are still in an early …

被引用次数：587 相关文章所有 8 个版本

[PDF] neurips.cc

Segnext: Rethinking convolutional attention design for semantic segmentation

MH Guo, CZ Lu, Q Hou, Z Liu… - Advances in Neural …, 2022 - proceedings.neurips.cc

We present SegNeXt, a simple convolutional network architecture for semantic
segmentation. Recent transformer-based models have dominated the field of se-mantic …

被引用次数：529 相关文章所有 6 个版本

[PDF] arxiv.org

Vision mamba: Efficient visual representation learning with bidirectional state space model

L Zhu, B Liao, Q Zhang, X Wang, W Liu… - arXiv preprint arXiv …, 2024 - arxiv.org

Recently the state space models (SSMs) with efficient hardware-aware designs, ie, the
Mamba deep learning model, have shown great potential for long sequence modeling …

被引用次数：459 相关文章所有 5 个版本

[PDF] nature.com

Illuminating protein space with a programmable generative model

JB Ingraham, M Baranov, Z Costello, KW Barber… - Nature, 2023 - nature.com

Three billion years of evolution has produced a tremendous diversity of protein molecules,
but the full potential of proteins is likely to be much greater. Accessing this potential has …

被引用次数：258 相关文章所有 17 个版本

[PDF] arxiv.org

Vision transformer adapter for dense predictions

Z Chen, Y Duan, W Wang, J He, T Lu, J Dai… - arXiv preprint arXiv …, 2022 - arxiv.org

This work investigates a simple yet powerful adapter for Vision Transformer (ViT). Unlike
recent visual transformers that introduce vision-specific inductive biases into their …

被引用次数：484 相关文章所有 3 个版本

[PDF] neurips.cc

Hornet: Efficient high-order spatial interactions with recursive gated convolutions

Y Rao, W Zhao, Y Tang, J Zhou… - Advances in Neural …, 2022 - proceedings.neurips.cc

Recent progress in vision Transformers exhibits great success in various tasks driven by the
new spatial modeling mechanism based on dot-product self-attention. In this paper, we …

被引用次数：274 相关文章所有 5 个版本

[PDF] arxiv.org

Maxvit: Multi-axis vision transformer

Z Tu, H Talebi, H Zhang, F Yang, P Milanfar… - European conference on …, 2022 - Springer

Transformers have recently gained significant attention in the computer vision community.
However, the lack of scalability of self-attention mechanisms with respect to image size has …

被引用次数：580 相关文章所有 8 个版本

[PDF] springer.com

Visual attention network

MH Guo, CZ Lu, ZN Liu, MM Cheng, SM Hu - Computational Visual Media, 2023 - Springer

While originally designed for natural language processing tasks, the self-attention
mechanism has recently taken various computer vision areas by storm. However, the 2D …

被引用次数：637 相关文章所有 8 个版本

高级搜索

QQ 群

Feature dimensionality reduction: a review

Large-scale multi-modal pre-trained models: A comprehensive survey

Internimage: Exploring large-scale vision foundation models with deformable convolutions

Segnext: Rethinking convolutional attention design for semantic segmentation

Vision mamba: Efficient visual representation learning with bidirectional state space model

Illuminating protein space with a programmable generative model

Vision transformer adapter for dense predictions

Hornet: Efficient high-order spatial interactions with recursive gated convolutions

Maxvit: Multi-axis vision transformer

Visual attention network

引用