- 学术资源搜索

[HTML][HTML] A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas

J Terven, DM Córdova-Esparza… - Machine Learning and …, 2023 - mdpi.com

YOLO has become a central real-time object detection system for robotics, driverless cars,
and video monitoring applications. We present a comprehensive analysis of YOLO's …

被引用次数：1664 相关文章所有 6 个版本

[PDF] arxiv.org

A comprehensive survey on pretrained foundation models: A history from bert to chatgpt

C Zhou, Q Li, C Li, J Yu, Y Liu, G Wang… - International Journal of …, 2024 - Springer

Abstract Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …

被引用次数：563 相关文章所有 2 个版本

[PDF] thecvf.com

Segment anything

A Kirillov, E Mintun, N Ravi, H Mao… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract We introduce the Segment Anything (SA) project: a new task, model, and dataset for
image segmentation. Using our efficient model in a data collection loop, we built the largest …

被引用次数：7540 相关文章所有 12 个版本

[PDF] arxiv.org

Grounding dino: Marrying dino with grounded pre-training for open-set object detection

S Liu, Z Zeng, T Ren, F Li, H Zhang, J Yang… - … on Computer Vision, 2025 - Springer

In this paper, we develop an open-set object detector, called Grounding DINO, by marrying
Transformer-based detector DINO with grounded pre-training, which can detect arbitrary …

被引用次数：1448 相关文章所有 4 个版本

[PDF] arxiv.org

Yolov9: Learning what you want to learn using programmable gradient information

CY Wang, IH Yeh, HY Mark Liao - European Conference on Computer …, 2025 - Springer

Today's deep learning methods focus on how to design the objective functions to make the
prediction as close as possible to the target. Meanwhile, an appropriate neural network …

被引用次数：1036 相关文章所有 3 个版本

[PDF] nature.com

Segment anything in medical images

J Ma, Y He, F Li, L Han, C You, B Wang - Nature Communications, 2024 - nature.com

Medical image segmentation is a critical component in clinical practice, facilitating accurate
diagnosis, treatment planning, and disease monitoring. However, existing methods, often …

被引用次数：1120 相关文章所有 11 个版本

[PDF] thecvf.com

Run, don't walk: chasing higher FLOPS for faster neural networks

J Chen, S Kao, H He, W Zhuo, S Wen… - Proceedings of the …, 2023 - openaccess.thecvf.com

To design fast neural networks, many works have been focusing on reducing the number of
floating-point operations (FLOPs). We observe that such reduction in FLOPs, however, does …

被引用次数：1014 相关文章所有 10 个版本

[PDF] thecvf.com

Biformer: Vision transformer with bi-level routing attention

L Zhu, X Wang, Z Ke, W Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

As the core building block of vision transformers, attention is a powerful tool to capture long-
range dependency. However, such power comes at a cost: it incurs a huge computation …

被引用次数：633 相关文章所有 10 个版本

[PDF] neurips.cc

Visionllm: Large language model is also an open-ended decoder for vision-centric tasks

W Wang, Z Chen, X Chen, J Wu… - Advances in …, 2024 - proceedings.neurips.cc

Large language models (LLMs) have notably accelerated progress towards artificial general
intelligence (AGI), with their impressive zero-shot capacity for user-tailored tasks, endowing …

被引用次数：416 相关文章所有 6 个版本

[PDF] arxiv.org

Diffusion policy: Visuomotor policy learning via action diffusion

C Chi, Z Xu, S Feng, E Cousineau… - … Journal of Robotics …, 2023 - journals.sagepub.com

This paper introduces Diffusion Policy, a new way of generating robot behavior by
representing a robot's visuomotor policy as a conditional denoising diffusion process. We …

被引用次数：476 相关文章所有 6 个版本

高级搜索

QQ 群

[HTML][HTML] A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas

A comprehensive survey on pretrained foundation models: A history from bert to chatgpt

Segment anything

Grounding dino: Marrying dino with grounded pre-training for open-set object detection

Yolov9: Learning what you want to learn using programmable gradient information

Segment anything in medical images

Run, don't walk: chasing higher FLOPS for faster neural networks

Biformer: Vision transformer with bi-level routing attention

Visionllm: Large language model is also an open-ended decoder for vision-centric tasks

Diffusion policy: Visuomotor policy learning via action diffusion

引用