Repvit: Revisiting mobile cnn from vit perspective

A Wang, H Chen, Z Lin, J Han… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Abstract Recently lightweight Vision Transformers (ViTs) demonstrate superior performance
and lower latency compared with lightweight Convolutional Neural Networks (CNNs) on …

MobileNetV4: Universal Models for the Mobile Ecosystem

D Qin, C Leichner, M Delakis, M Fornoni, S Luo… - … on Computer Vision, 2025 - Springer
We present the latest generation of MobileNets: MobileNetV4 (MNv4). They feature
universally-efficient architecture designs for mobile devices. We introduce the Universal …

UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio Video Point Cloud Time-Series and Image Recognition

X Ding, Y Zhang, Y Ge, S Zhao… - Proceedings of the …, 2024 - openaccess.thecvf.com
Large-kernel convolutional neural networks (ConvNets) have recently received extensive
research attention but two unresolved and critical issues demand further investigation. 1) …

Mobileclip: Fast image-text models through multi-modal reinforced training

PKA Vasu, H Pouransari, F Faghri… - Proceedings of the …, 2024 - openaccess.thecvf.com
Contrastive pre-training of image-text foundation models such as CLIP demonstrated
excellent zero-shot performance and improved robustness on a wide range of downstream …

Self-supervised masked convolutional transformer block for anomaly detection

N Madan, NC Ristea, RT Ionescu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Anomaly detection has recently gained increasing attention in the field of computer vision,
likely due to its broad set of applications ranging from product fault detection on industrial …

Shvit: Single-head vision transformer with memory efficient macro design

S Yun, Y Ro - Proceedings of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Abstract Recently efficient Vision Transformers have shown great performance with low
latency on resource-constrained devices. Conventionally they use 4x4 patch embeddings …

Pruning self-attentions into convolutional layers in single path

H He, J Cai, J Liu, Z Pan, J Zhang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Vision Transformers (ViTs) have achieved impressive performance over various computer
vision tasks. However, modeling global correlations with multi-head self-attention (MSA) …

Model compression in practice: Lessons learned from practitioners creating on-device machine learning experiences

F Hohman, MB Kery, D Ren, D Moritz - … of the CHI Conference on Human …, 2024 - dl.acm.org
On-device machine learning (ML) promises to improve the privacy, responsiveness, and
proliferation of new, intelligent user experiences by moving ML computation onto everyday …

[HTML][HTML] Real-time semantic segmentation for autonomous driving: A review of CNNs, Transformers, and Beyond

MAM Elhassan, C Zhou, A Khan, A Benabid… - Journal of King Saud …, 2024 - Elsevier
Real-time semantic segmentation is a crucial component of autonomous driving systems,
where accurate and efficient scene interpretation is essential to ensure both safety and …

Ditfastattn: Attention compression for diffusion transformer models

Z Yuan, H Zhang, P Lu, X Ning, L Zhang, T Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org
Diffusion Transformers (DiT) excel at image and video generation but face computational
challenges due to the quadratic complexity of self-attention operators. We propose …