Are we ready for a new paradigm shift? a survey on visual deep mlp

R Liu, Y Li, L Tao, D Liang, HT Zheng - Patterns, 2022 - cell.com
Recently, the proposed deep multilayer perceptron (MLP) models have stirred up a lot of
interest in the vision community. Historically, the availability of larger datasets combined with …

Focal modulation networks

J Yang, C Li, X Dai, J Gao - Advances in Neural Information …, 2022 - proceedings.neurips.cc
We propose focal modulation networks (FocalNets in short), where self-attention (SA) is
completely replaced by a focal modulation module for modeling token interactions in vision …

Hire-mlp: Vision mlp via hierarchical rearrangement

J Guo, Y Tang, K Han, X Chen, H Wu… - Proceedings of the …, 2022 - openaccess.thecvf.com
Previous vision MLPs such as MLP-Mixer and ResMLP accept linearly flattened image
patches as input, making them inflexible for different input sizes and hard to capture spatial …

Pointmixer: Mlp-mixer for point cloud understanding

J Choe, C Park, F Rameau, J Park… - European Conference on …, 2022 - Springer
MLP-Mixer has newly appeared as a new challenger against the realm of CNNs and
Transformer. Despite its simplicity compared to Transformer, the concept of channel-mixing …

Sequencer: Deep lstm for image classification

Y Tatsunami, M Taki - Advances in Neural Information …, 2022 - proceedings.neurips.cc
In recent computer vision research, the advent of the Vision Transformer (ViT) has rapidly
revolutionized various architectural design efforts: ViT achieved state-of-the-art image …

Dynamixer: a vision mlp architecture with dynamic mixing

Z Wang, W Jiang, YM Zhu, L Yuan… - … on machine learning, 2022 - proceedings.mlr.press
Recently, MLP-like vision models have achieved promising performances on mainstream
visual recognition tasks. In contrast with vision transformers and CNNs, the success of MLP …

PS-mixer: A polar-vector and strength-vector mixer model for multimodal sentiment analysis

H Lin, P Zhang, J Ling, Z Yang, LK Lee, W Liu - Information Processing & …, 2023 - Elsevier
Multimodal sentiment analysis aims to judge the sentiment of multimodal data uploaded by
the Internet users on various social media platforms. On one hand, existing studies focus on …

BOAT: Bilateral local attention vision transformer

T Yu, G Zhao, P Li, Y Yu - arXiv preprint arXiv:2201.13027, 2022 - arxiv.org
Vision Transformers achieved outstanding performance in many computer vision tasks.
Early Vision Transformers such as ViT and DeiT adopt global self-attention, which is …

Mlp-3d: A mlp-like 3d architecture with grouped time mixing

Z Qiu, T Yao, CW Ngo, T Mei - … of the ieee/cvf conference on …, 2022 - openaccess.thecvf.com
Abstract Convolutional Neural Networks (CNNs) have been regarded as the go-to models
for visual recognition. More recently, convolution-free networks, based on multi-head self …

Multilinear operator networks

Y Cheng, GG Chrysos, M Georgopoulos… - arXiv preprint arXiv …, 2024 - arxiv.org
Despite the remarkable capabilities of deep neural networks in image recognition, the
dependence on activation functions remains a largely unexplored area and has yet to be …