- 学术资源搜索

[HTML][HTML] A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas

J Terven, DM Córdova-Esparza… - Machine Learning and …, 2023 - mdpi.com

YOLO has become a central real-time object detection system for robotics, driverless cars,
and video monitoring applications. We present a comprehensive analysis of YOLO's …

被引用次数：901 相关文章所有 6 个版本

[PDF] springer.com

Attention mechanisms in computer vision: A survey

MH Guo, TX Xu, JJ Liu, ZN Liu, PT Jiang, TJ Mu… - Computational visual …, 2022 - Springer

Humans can naturally and effectively find salient regions in complex scenes. Motivated by
this observation, attention mechanisms were introduced into computer vision with the aim of …

被引用次数：1387 相关文章所有 8 个版本

[PDF] thecvf.com

Run, don't walk: chasing higher FLOPS for faster neural networks

J Chen, S Kao, H He, W Zhuo, S Wen… - Proceedings of the …, 2023 - openaccess.thecvf.com

To design fast neural networks, many works have been focusing on reducing the number of
floating-point operations (FLOPs). We observe that such reduction in FLOPs, however, does …

被引用次数：551 相关文章所有 10 个版本

[PDF] thecvf.com

Internimage: Exploring large-scale vision foundation models with deformable convolutions

W Wang, J Dai, Z Chen, Z Huang, Z Li… - Proceedings of the …, 2023 - openaccess.thecvf.com

Compared to the great progress of large-scale vision transformers (ViTs) in recent years,
large-scale models based on convolutional neural networks (CNNs) are still in an early …

被引用次数：491 相关文章所有 8 个版本

[PDF] thecvf.com

Eva: Exploring the limits of masked visual representation learning at scale

Y Fang, W Wang, B Xie, Q Sun, L Wu… - Proceedings of the …, 2023 - openaccess.thecvf.com

We launch EVA, a vision-centric foundation model to explore the limits of visual
representation at scale using only publicly accessible data. EVA is a vanilla ViT pre-trained …

被引用次数：447 相关文章所有 5 个版本

[PDF] thecvf.com

Convnext v2: Co-designing and scaling convnets with masked autoencoders

S Woo, S Debnath, R Hu, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com

Driven by improved architectures and better representation learning frameworks, the field of
visual recognition has enjoyed rapid modernization and performance boost in the early …

被引用次数：353 相关文章所有 8 个版本

[PDF] neurips.cc

Segnext: Rethinking convolutional attention design for semantic segmentation

MH Guo, CZ Lu, Q Hou, Z Liu… - Advances in Neural …, 2022 - proceedings.neurips.cc

We present SegNeXt, a simple convolutional network architecture for semantic
segmentation. Recent transformer-based models have dominated the field of se-mantic …

被引用次数：426 相关文章所有 6 个版本

[PDF] arxiv.org

[PDF][PDF] Timesnet: Temporal 2d-variation modeling for general time series analysis

H Wu, T Hu, Y Liu, H Zhou, J Wang, M Long - arXiv preprint arXiv …, 2022 - arxiv.org

Time series analysis is of immense importance in extensive applications, such as weather
forecasting, anomaly detection, and action recognition. This paper focuses on temporal …

被引用次数：426 相关文章所有 5 个版本

[PDF] arxiv.org

Vision mamba: Efficient visual representation learning with bidirectional state space model

L Zhu, B Liao, Q Zhang, X Wang, W Liu… - arXiv preprint arXiv …, 2024 - arxiv.org

Recently the state space models (SSMs) with efficient hardware-aware designs, ie, the
Mamba deep learning model, have shown great potential for long sequence modeling …

被引用次数：293 相关文章所有 5 个版本

[HTML] sciencedirect.com

[HTML][HTML] Deep learning in food category recognition

Y Zhang, L Deng, H Zhu, W Wang, Z Ren, Q Zhou… - Information …, 2023 - Elsevier

Integrating artificial intelligence with food category recognition has been a field of interest for
research for the past few decades. It is potentially one of the next steps in revolutionizing …

被引用次数：201 相关文章所有 4 个版本

高级搜索

QQ 群

[HTML][HTML] A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas

Attention mechanisms in computer vision: A survey

Run, don't walk: chasing higher FLOPS for faster neural networks

Internimage: Exploring large-scale vision foundation models with deformable convolutions

Eva: Exploring the limits of masked visual representation learning at scale

Convnext v2: Co-designing and scaling convnets with masked autoencoders

Segnext: Rethinking convolutional attention design for semantic segmentation

[PDF][PDF] Timesnet: Temporal 2d-variation modeling for general time series analysis

Vision mamba: Efficient visual representation learning with bidirectional state space model

[HTML][HTML] Deep learning in food category recognition

引用