Openimages: A public dataset for large-scale multi-label and multi-class image classification

[HTML][HTML] A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas

J Terven, DM Córdova-Esparza… - Machine Learning and …, 2023 - mdpi.com

YOLO has become a central real-time object detection system for robotics, driverless cars,
and video monitoring applications. We present a comprehensive analysis of YOLO's …

被引用次数：1213 相关文章所有 6 个版本

[PDF] arxiv.org

Learning from noisy labels with deep neural networks: A survey

H Song, M Kim, D Park, Y Shin… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Deep learning has achieved remarkable success in numerous domains with help from large
amounts of big data. However, the quality of data labels is a concern because of the lack of …

被引用次数：1082 相关文章所有 10 个版本

[PDF] arxiv.org

Grounding dino: Marrying dino with grounded pre-training for open-set object detection

S Liu, Z Zeng, T Ren, F Li, H Zhang, J Yang… - arXiv preprint arXiv …, 2023 - arxiv.org

In this paper, we present an open-set object detector, called Grounding DINO, by marrying
Transformer-based detector DINO with grounded pre-training, which can detect arbitrary …

被引用次数：1042 相关文章所有 4 个版本

[PDF] arxiv.org

Cogvlm: Visual expert for pretrained language models

W Wang, Q Lv, W Yu, W Hong, J Qi, Y Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

We introduce CogVLM, a powerful open-source visual language foundation model. Different
from the popular shallow alignment method which maps image features into the input space …

被引用次数：308 相关文章所有 3 个版本

[PDF] neurips.cc

Glipv2: Unifying localization and vision-language understanding

H Zhang, P Zhang, X Hu, YC Chen… - Advances in …, 2022 - proceedings.neurips.cc

We present GLIPv2, a grounded VL understanding model, that serves both localization tasks
(eg, object detection, instance segmentation) and Vision-Language (VL) understanding …

被引用次数：253 相关文章所有 4 个版本

[PDF] thecvf.com

Grounded language-image pre-training

LH Li, P Zhang, H Zhang, J Yang, C Li… - Proceedings of the …, 2022 - openaccess.thecvf.com

This paper presents a grounded language-image pre-training (GLIP) model for learning
object-level, language-aware, and semantic-rich visual representations. GLIP unifies object …

被引用次数：903 相关文章所有 8 个版本

[PDF] thecvf.com

Vector quantized diffusion model for text-to-image synthesis

S Gu, D Chen, J Bao, F Wen, B Zhang… - Proceedings of the …, 2022 - openaccess.thecvf.com

We present the vector quantized diffusion (VQ-Diffusion) model for text-to-image generation.
This method is based on a vector quantized variational autoencoder (VQ-VAE) whose latent …

被引用次数：699 相关文章所有 10 个版本

[PDF] thecvf.com

Vim: Out-of-distribution with virtual-logit matching

H Wang, Z Li, L Feng, W Zhang - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

Most of the existing Out-Of-Distribution (OOD) detection algorithms depend on single input
source: the feature, the logit, or the softmax probability. However, the immense diversity of …

被引用次数：251 相关文章所有 5 个版本

[PDF] neurips.cc

Detclip: Dictionary-enriched visual-concept paralleled pre-training for open-world detection

L Yao, J Han, Y Wen, X Liang, D Xu… - Advances in …, 2022 - proceedings.neurips.cc

Open-world object detection, as a more general and challenging goal, aims to recognize
and localize objects described by arbitrary category names. The recent work GLIP …

被引用次数：107 相关文章所有 5 个版本

[PDF] thecvf.com

Conceptual 12m: Pushing web-scale image-text pre-training to recognize long-tail visual concepts

S Changpinyo, P Sharma, N Ding… - Proceedings of the …, 2021 - openaccess.thecvf.com

The availability of large-scale image captioning and visual question answering datasets has
contributed significantly to recent successes in vision-and-language pre-training. However …

被引用次数：875 相关文章所有 9 个版本

高级搜索

QQ 群