Evolving Interpretable Visual Classifiers with Large Language Models

M Chiquier, U Mall, C Vondrick - arXiv preprint arXiv:2404.09941, 2024 - arxiv.org
Multimodal pre-trained models, such as CLIP, are popular for zero-shot classification due to
their open-vocabulary flexibility and high performance. However, vision-language models …

Open-Vocabulary Object Detection via Neighboring Region Attention Alignment

S Qiang, X Li, Y Liang, W Liao, T He, P Peng - arXiv preprint arXiv …, 2024 - arxiv.org
The nature of diversity in real-world environments necessitates neural network models to
expand from closed category settings to accommodate novel emerging categories. In this …

Training-free Boost for Open-Vocabulary Object Detection with Confidence Aggregation

Y Zheng, K Liu - arXiv preprint arXiv:2404.08603, 2024 - arxiv.org
Open-vocabulary object detection (OVOD) aims at localizing and recognizing visual objects
from novel classes unseen at the training time. Whereas, empirical studies reveal that …