Lisa: Reasoning segmentation via large language model

X Lai, Z Tian, Y Chen, Y Li, Y Yuan… - Proceedings of the …, 2024 - openaccess.thecvf.com
Although perception systems have made remarkable advancements in recent years they still
rely on explicit human instruction or pre-defined categories to identify the target objects …

What does a platypus look like? generating customized prompts for zero-shot image classification

S Pratt, I Covert, R Liu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Open-vocabulary models are a promising new paradigm for image classification. Unlike
traditional classification models, open-vocabulary models classify among any arbitrary set of …

Semantic-sam: Segment and recognize anything at any granularity

F Li, H Zhang, P Sun, X Zou, S Liu, J Yang, C Li… - arXiv preprint arXiv …, 2023 - arxiv.org
In this paper, we introduce Semantic-SAM, a universal image segmentation model to enable
segment and recognize anything at any desired granularity. Our model offers two key …

Semmae: Semantic-guided masking for learning masked autoencoders

G Li, H Zheng, D Liu, C Wang, B Su… - Advances in Neural …, 2022 - proceedings.neurips.cc
Recently, significant progress has been made in masked image modeling to catch up to
masked language modeling. However, unlike words in NLP, the lack of semantic …

Paco: Parts and attributes of common objects

V Ramanathan, A Kalia, V Petrovic… - Proceedings of the …, 2023 - openaccess.thecvf.com
Object models are gradually progressing from predicting just category labels to providing
detailed descriptions of object instances. This motivates the need for large datasets which …

Osprey: Pixel understanding with visual instruction tuning

Y Yuan, W Li, J Liu, D Tang, X Luo… - Proceedings of the …, 2024 - openaccess.thecvf.com
Multimodal large language models (MLLMs) have recently achieved impressive general-
purpose vision-language capabilities through visual instruction tuning. However current …

Going denser with open-vocabulary part segmentation

P Sun, S Chen, C Zhu, F Xiao, P Luo… - Proceedings of the …, 2023 - openaccess.thecvf.com
Object detection has been expanded from a limited number of categories to open
vocabulary. Moving forward, a complete intelligent vision system requires understanding …

Pip-net: Patch-based intuitive prototypes for interpretable image classification

M Nauta, J Schlötterer… - Proceedings of the …, 2023 - openaccess.thecvf.com
Interpretable methods based on prototypical patches recognize various components in an
image in order to explain their reasoning to humans. However, existing prototype-based …

Dataset pruning: Reducing training data by examining generalization influence

S Yang, Z Xie, H Peng, M Xu, M Sun, P Li - arXiv preprint arXiv …, 2022 - arxiv.org
The great success of deep learning heavily relies on increasingly larger training data, which
comes at a price of huge computational and infrastructural costs. This poses crucial …

Animal3d: A comprehensive dataset of 3d animal pose and shape

J Xu, Y Zhang, J Peng, W Ma… - Proceedings of the …, 2023 - openaccess.thecvf.com
Accurately estimating the 3D pose and shape is an essential step towards understanding
animal behavior, and can potentially benefit many downstream applications, such as wildlife …