[HTML][HTML] Are we ready for a new paradigm shift? a survey on visual deep mlp

R Liu, Y Li, L Tao, D Liang, HT Zheng - Patterns, 2022 - cell.com
Recently, the proposed deep multilayer perceptron (MLP) models have stirred up a lot of
interest in the vision community. Historically, the availability of larger datasets combined with …

S2-mlp: Spatial-shift mlp architecture for vision

T Yu, X Li, Y Cai, M Sun, P Li - Proceedings of the IEEE/CVF …, 2022 - openaccess.thecvf.com
Abstract Recently, visual Transformer (ViT) and its following works abandon the convolution
and exploit the self-attention operation, attaining a comparable or even higher accuracy than …

S-MLPv2: Improved Spatial-Shift MLP Architecture for Vision

T Yu, X Li, Y Cai, M Sun, P Li - arXiv preprint arXiv:2108.01072, 2021 - arxiv.org
Recently, MLP-based vision backbones emerge. MLP-based vision architectures with less
inductive bias achieve competitive performance in image recognition compared with CNNs …

[HTML][HTML] Fruit ripeness identification using transformers

B Xiao, M Nguyen, WQ Yan - Applied Intelligence, 2023 - Springer
Pattern classification has always been essential in computer vision. Transformer paradigm
having attention mechanism with global receptive field in computer vision improves the …

SLT-Net: A codec network for skin lesion segmentation

K Feng, L Ren, G Wang, H Wang, Y Li - Computers in Biology and Medicine, 2022 - Elsevier
Automatic segmentation of skin lesions is beneficial for improving the accuracy and
efficiency of melanoma diagnosis. However, due to variation in the size and shape of the …

Pass: Patch automatic skip scheme for efficient on-device video perception

Q Zhou, S Guo, J Pan, J Liang, J Guo… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Real-time video perception tasks are often challenging on resource-constrained edge
devices due to the issues of accuracy drop and hardware overhead, where saving …

BOAT: Bilateral local attention vision transformer

T Yu, G Zhao, P Li, Y Yu - arXiv preprint arXiv:2201.13027, 2022 - arxiv.org
Vision Transformers achieved outstanding performance in many computer vision tasks.
Early Vision Transformers such as ViT and DeiT adopt global self-attention, which is …

Spanet: Frequency-balancing token mixer using spectral pooling aggregation modulation

G Yun, J Yoo, K Kim, J Lee… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Recent studies show that self-attentions behave like low-pass filters (as opposed to
convolutions) and enhancing their high-pass filtering capability improves model …

Next generation of computer vision for plant disease monitoring in precision agriculture: A contemporary survey, taxonomy, experiments, and future direction

W Ding, M Abdel-Basset, I Alrashdi, H Hawash - Information Sciences, 2024 - Elsevier
Efficient and rational monitoring of plant health is an essential prerequisite for ensuring
optimal crop production and resource management in the field of agriculture. Computer …

Multigrained hybrid neural network for rotating machinery fault diagnosis using joint local and global information

Z Yang, B He, G Li, P Lu, B Cheng… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Deep learning (DL) models, such as multilayer perceptrons (MLPs) and convolutional neural
networks (CNNs), have strong feature representation and nonlinear mapping capabilities …