Rethinking token-mixing mlp for mlp-based vision backbone

R Liu, Y Li, L Tao, D Liang, HT Zheng - Patterns, 2022 - cell.com

Recently, the proposed deep multilayer perceptron (MLP) models have stirred up a lot of
interest in the vision community. Historically, the availability of larger datasets combined with …

被引用次数：73 相关文章所有 7 个版本

[PDF] thecvf.com

S2-mlp: Spatial-shift mlp architecture for vision

T Yu, X Li, Y Cai, M Sun, P Li - Proceedings of the IEEE/CVF …, 2022 - openaccess.thecvf.com

Abstract Recently, visual Transformer (ViT) and its following works abandon the convolution
and exploit the self-attention operation, attaining a comparable or even higher accuracy than …

被引用次数：202 相关文章所有 7 个版本

[PDF] arxiv.org

S-MLPv2: Improved Spatial-Shift MLP Architecture for Vision

T Yu, X Li, Y Cai, M Sun, P Li - arXiv preprint arXiv:2108.01072, 2021 - arxiv.org

Recently, MLP-based vision backbones emerge. MLP-based vision architectures with less
inductive bias achieve competitive performance in image recognition compared with CNNs …

被引用次数：53 相关文章所有 2 个版本

[HTML] springer.com

[HTML][HTML] Fruit ripeness identification using transformers

B Xiao, M Nguyen, WQ Yan - Applied Intelligence, 2023 - Springer

Pattern classification has always been essential in computer vision. Transformer paradigm
having attention mechanism with global receptive field in computer vision improves the …

被引用次数：16 相关文章所有 5 个版本

SLT-Net: A codec network for skin lesion segmentation

K Feng, L Ren, G Wang, H Wang, Y Li - Computers in Biology and Medicine, 2022 - Elsevier

Automatic segmentation of skin lesions is beneficial for improving the accuracy and
efficiency of melanoma diagnosis. However, due to variation in the size and shape of the …

被引用次数：22 相关文章所有 4 个版本

Pass: Patch automatic skip scheme for efficient on-device video perception

Q Zhou, S Guo, J Pan, J Liang, J Guo… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Real-time video perception tasks are often challenging on resource-constrained edge
devices due to the issues of accuracy drop and hardware overhead, where saving …

被引用次数：9 相关文章所有 5 个版本

[PDF] arxiv.org

BOAT: Bilateral local attention vision transformer

T Yu, G Zhao, P Li, Y Yu - arXiv preprint arXiv:2201.13027, 2022 - arxiv.org

Vision Transformers achieved outstanding performance in many computer vision tasks.
Early Vision Transformers such as ViT and DeiT adopt global self-attention, which is …

被引用次数：31 相关文章所有 3 个版本

[PDF] thecvf.com

Spanet: Frequency-balancing token mixer using spectral pooling aggregation modulation

G Yun, J Yoo, K Kim, J Lee… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Recent studies show that self-attentions behave like low-pass filters (as opposed to
convolutions) and enhancing their high-pass filtering capability improves model …

被引用次数：5 相关文章所有 5 个版本

Next generation of computer vision for plant disease monitoring in precision agriculture: A contemporary survey, taxonomy, experiments, and future direction

W Ding, M Abdel-Basset, I Alrashdi, H Hawash - Information Sciences, 2024 - Elsevier

Efficient and rational monitoring of plant health is an essential prerequisite for ensuring
optimal crop production and resource management in the field of agriculture. Computer …

被引用次数：5 相关文章

Multigrained hybrid neural network for rotating machinery fault diagnosis using joint local and global information

Z Yang, B He, G Li, P Lu, B Cheng… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Deep learning (DL) models, such as multilayer perceptrons (MLPs) and convolutional neural
networks (CNNs), have strong feature representation and nonlinear mapping capabilities …

被引用次数：5 相关文章所有 2 个版本

高级搜索

QQ 群