S $^ 2$-MLPv2: improved spatial-shift MLP architecture for vision

R Liu, Y Li, L Tao, D Liang, HT Zheng - Patterns, 2022 - cell.com

Recently, the proposed deep multilayer perceptron (MLP) models have stirred up a lot of
interest in the vision community. Historically, the availability of larger datasets combined with …

被引用次数：74 相关文章所有 7 个版本

[PDF] neurips.cc

Focal modulation networks

J Yang, C Li, X Dai, J Gao - Advances in Neural Information …, 2022 - proceedings.neurips.cc

We propose focal modulation networks (FocalNets in short), where self-attention (SA) is
completely replaced by a focal modulation module for modeling token interactions in vision …

被引用次数：207 相关文章所有 6 个版本

[PDF] thecvf.com

Hire-mlp: Vision mlp via hierarchical rearrangement

J Guo, Y Tang, K Han, X Chen, H Wu… - Proceedings of the …, 2022 - openaccess.thecvf.com

Previous vision MLPs such as MLP-Mixer and ResMLP accept linearly flattened image
patches as input, making them inflexible for different input sizes and hard to capture spatial …

被引用次数：114 相关文章所有 6 个版本

[PDF] arxiv.org

Pointmixer: Mlp-mixer for point cloud understanding

J Choe, C Park, F Rameau, J Park… - European Conference on …, 2022 - Springer

MLP-Mixer has newly appeared as a new challenger against the realm of CNNs and
Transformer. Despite its simplicity compared to Transformer, the concept of channel-mixing …

被引用次数：97 相关文章所有 7 个版本

[PDF] neurips.cc

Sequencer: Deep lstm for image classification

Y Tatsunami, M Taki - Advances in Neural Information …, 2022 - proceedings.neurips.cc

In recent computer vision research, the advent of the Vision Transformer (ViT) has rapidly
revolutionized various architectural design efforts: ViT achieved state-of-the-art image …

被引用次数：69 相关文章所有 5 个版本

[PDF] mlr.press

Dynamixer: a vision mlp architecture with dynamic mixing

Z Wang, W Jiang, YM Zhu, L Yuan… - … on machine learning, 2022 - proceedings.mlr.press

Recently, MLP-like vision models have achieved promising performances on mainstream
visual recognition tasks. In contrast with vision transformers and CNNs, the success of MLP …

被引用次数：36 相关文章所有 5 个版本

PS-mixer: A polar-vector and strength-vector mixer model for multimodal sentiment analysis

H Lin, P Zhang, J Ling, Z Yang, LK Lee, W Liu - Information Processing & …, 2023 - Elsevier

Multimodal sentiment analysis aims to judge the sentiment of multimodal data uploaded by
the Internet users on various social media platforms. On one hand, existing studies focus on …

被引用次数：25 相关文章所有 2 个版本

[PDF] arxiv.org

BOAT: Bilateral local attention vision transformer

T Yu, G Zhao, P Li, Y Yu - arXiv preprint arXiv:2201.13027, 2022 - arxiv.org

Vision Transformers achieved outstanding performance in many computer vision tasks.
Early Vision Transformers such as ViT and DeiT adopt global self-attention, which is …

被引用次数：31 相关文章所有 3 个版本

[PDF] thecvf.com

Mlp-3d: A mlp-like 3d architecture with grouped time mixing

Z Qiu, T Yao, CW Ngo, T Mei - … of the ieee/cvf conference on …, 2022 - openaccess.thecvf.com

Abstract Convolutional Neural Networks (CNNs) have been regarded as the go-to models
for visual recognition. More recently, convolution-free networks, based on multi-head self …

被引用次数：19 相关文章所有 7 个版本

[PDF] arxiv.org

Multilinear operator networks

Y Cheng, GG Chrysos, M Georgopoulos… - arXiv preprint arXiv …, 2024 - arxiv.org

Despite the remarkable capabilities of deep neural networks in image recognition, the
dependence on activation functions remains a largely unexplored area and has yet to be …

被引用次数：8 相关文章所有 3 个版本

高级搜索

QQ 群