[HTML][HTML] A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas

J Terven, DM Córdova-Esparza… - Machine Learning and …, 2023 - mdpi.com
YOLO has become a central real-time object detection system for robotics, driverless cars,
and video monitoring applications. We present a comprehensive analysis of YOLO's …

Learning from noisy labels with deep neural networks: A survey

H Song, M Kim, D Park, Y Shin… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Deep learning has achieved remarkable success in numerous domains with help from large
amounts of big data. However, the quality of data labels is a concern because of the lack of …

Grounding dino: Marrying dino with grounded pre-training for open-set object detection

S Liu, Z Zeng, T Ren, F Li, H Zhang, J Yang… - arXiv preprint arXiv …, 2023 - arxiv.org
In this paper, we present an open-set object detector, called Grounding DINO, by marrying
Transformer-based detector DINO with grounded pre-training, which can detect arbitrary …

Cogvlm: Visual expert for pretrained language models

W Wang, Q Lv, W Yu, W Hong, J Qi, Y Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
We introduce CogVLM, a powerful open-source visual language foundation model. Different
from the popular shallow alignment method which maps image features into the input space …

Glipv2: Unifying localization and vision-language understanding

H Zhang, P Zhang, X Hu, YC Chen… - Advances in …, 2022 - proceedings.neurips.cc
We present GLIPv2, a grounded VL understanding model, that serves both localization tasks
(eg, object detection, instance segmentation) and Vision-Language (VL) understanding …

Grounded language-image pre-training

LH Li, P Zhang, H Zhang, J Yang, C Li… - Proceedings of the …, 2022 - openaccess.thecvf.com
This paper presents a grounded language-image pre-training (GLIP) model for learning
object-level, language-aware, and semantic-rich visual representations. GLIP unifies object …

Vector quantized diffusion model for text-to-image synthesis

S Gu, D Chen, J Bao, F Wen, B Zhang… - Proceedings of the …, 2022 - openaccess.thecvf.com
We present the vector quantized diffusion (VQ-Diffusion) model for text-to-image generation.
This method is based on a vector quantized variational autoencoder (VQ-VAE) whose latent …

Vim: Out-of-distribution with virtual-logit matching

H Wang, Z Li, L Feng, W Zhang - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Most of the existing Out-Of-Distribution (OOD) detection algorithms depend on single input
source: the feature, the logit, or the softmax probability. However, the immense diversity of …

Detclip: Dictionary-enriched visual-concept paralleled pre-training for open-world detection

L Yao, J Han, Y Wen, X Liang, D Xu… - Advances in …, 2022 - proceedings.neurips.cc
Open-world object detection, as a more general and challenging goal, aims to recognize
and localize objects described by arbitrary category names. The recent work GLIP …

Conceptual 12m: Pushing web-scale image-text pre-training to recognize long-tail visual concepts

S Changpinyo, P Sharma, N Ding… - Proceedings of the …, 2021 - openaccess.thecvf.com
The availability of large-scale image captioning and visual question answering datasets has
contributed significantly to recent successes in vision-and-language pre-training. However …