- 学术资源搜索

Repvit: Revisiting mobile cnn from vit perspective

A Wang, H Chen, Z Lin, J Han… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Abstract Recently lightweight Vision Transformers (ViTs) demonstrate superior performance
and lower latency compared with lightweight Convolutional Neural Networks (CNNs) on …

被引用次数：118 相关文章所有 4 个版本

[PDF] arxiv.org

MobileNetV4: Universal Models for the Mobile Ecosystem

D Qin, C Leichner, M Delakis, M Fornoni, S Luo… - … on Computer Vision, 2025 - Springer

We present the latest generation of MobileNets: MobileNetV4 (MNv4). They feature
universally-efficient architecture designs for mobile devices. We introduce the Universal …

被引用次数：45 相关文章所有 2 个版本

[PDF] thecvf.com

UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio Video Point Cloud Time-Series and Image Recognition

X Ding, Y Zhang, Y Ge, S Zhao… - Proceedings of the …, 2024 - openaccess.thecvf.com

Large-kernel convolutional neural networks (ConvNets) have recently received extensive
research attention but two unresolved and critical issues demand further investigation. 1) …

被引用次数：95 相关文章所有 3 个版本

[PDF] thecvf.com

Mobileclip: Fast image-text models through multi-modal reinforced training

PKA Vasu, H Pouransari, F Faghri… - Proceedings of the …, 2024 - openaccess.thecvf.com

Contrastive pre-training of image-text foundation models such as CLIP demonstrated
excellent zero-shot performance and improved robustness on a wide range of downstream …

被引用次数：18 相关文章所有 2 个版本

Self-supervised masked convolutional transformer block for anomaly detection

N Madan, NC Ristea, RT Ionescu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org

Anomaly detection has recently gained increasing attention in the field of computer vision,
likely due to its broad set of applications ranging from product fault detection on industrial …

被引用次数：59 相关文章所有 12 个版本

[PDF] thecvf.com

Shvit: Single-head vision transformer with memory efficient macro design

S Yun, Y Ro - Proceedings of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

Abstract Recently efficient Vision Transformers have shown great performance with low
latency on resource-constrained devices. Conventionally they use 4x4 patch embeddings …

被引用次数：12 相关文章所有 3 个版本

[PDF] arxiv.org

Pruning self-attentions into convolutional layers in single path

H He, J Cai, J Liu, Z Pan, J Zhang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Vision Transformers (ViTs) have achieved impressive performance over various computer
vision tasks. However, modeling global correlations with multi-head self-attention (MSA) …

被引用次数：43 相关文章所有 8 个版本

[PDF] acm.org

Model compression in practice: Lessons learned from practitioners creating on-device machine learning experiences

F Hohman, MB Kery, D Ren, D Moritz - … of the CHI Conference on Human …, 2024 - dl.acm.org

On-device machine learning (ML) promises to improve the privacy, responsiveness, and
proliferation of new, intelligent user experiences by moving ML computation onto everyday …

被引用次数：8 相关文章所有 3 个版本

[HTML] sciencedirect.com

[HTML][HTML] Real-time semantic segmentation for autonomous driving: A review of CNNs, Transformers, and Beyond

MAM Elhassan, C Zhou, A Khan, A Benabid… - Journal of King Saud …, 2024 - Elsevier

Real-time semantic segmentation is a crucial component of autonomous driving systems,
where accurate and efficient scene interpretation is essential to ensure both safety and …

[PDF] arxiv.org

Ditfastattn: Attention compression for diffusion transformer models

Z Yuan, H Zhang, P Lu, X Ning, L Zhang, T Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org

Diffusion Transformers (DiT) excel at image and video generation but face computational
challenges due to the quadratic complexity of self-attention operators. We propose …

被引用次数：4 相关文章所有 2 个版本

高级搜索

QQ 群

Repvit: Revisiting mobile cnn from vit perspective

MobileNetV4: Universal Models for the Mobile Ecosystem

UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio Video Point Cloud Time-Series and Image Recognition

Mobileclip: Fast image-text models through multi-modal reinforced training

Self-supervised masked convolutional transformer block for anomaly detection

Shvit: Single-head vision transformer with memory efficient macro design

Pruning self-attentions into convolutional layers in single path

Model compression in practice: Lessons learned from practitioners creating on-device machine learning experiences

[HTML][HTML] Real-time semantic segmentation for autonomous driving: A review of CNNs, Transformers, and Beyond

Ditfastattn: Attention compression for diffusion transformer models

引用