Partimagenet: A large, high-quality dataset of parts

X Lai, Z Tian, Y Chen, Y Li, Y Yuan… - Proceedings of the …, 2024 - openaccess.thecvf.com

Although perception systems have made remarkable advancements in recent years they still
rely on explicit human instruction or pre-defined categories to identify the target objects …

被引用次数：166 相关文章所有 4 个版本

[PDF] thecvf.com

What does a platypus look like? generating customized prompts for zero-shot image classification

S Pratt, I Covert, R Liu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Open-vocabulary models are a promising new paradigm for image classification. Unlike
traditional classification models, open-vocabulary models classify among any arbitrary set of …

被引用次数：137 相关文章所有 7 个版本

[PDF] arxiv.org

Semantic-sam: Segment and recognize anything at any granularity

F Li, H Zhang, P Sun, X Zou, S Liu, J Yang, C Li… - arXiv preprint arXiv …, 2023 - arxiv.org

In this paper, we introduce Semantic-SAM, a universal image segmentation model to enable
segment and recognize anything at any desired granularity. Our model offers two key …

被引用次数：89 相关文章所有 2 个版本

[PDF] neurips.cc

Semmae: Semantic-guided masking for learning masked autoencoders

G Li, H Zheng, D Liu, C Wang, B Su… - Advances in Neural …, 2022 - proceedings.neurips.cc

Recently, significant progress has been made in masked image modeling to catch up to
masked language modeling. However, unlike words in NLP, the lack of semantic …

被引用次数：75 相关文章所有 5 个版本

[PDF] thecvf.com

Paco: Parts and attributes of common objects

V Ramanathan, A Kalia, V Petrovic… - Proceedings of the …, 2023 - openaccess.thecvf.com

Object models are gradually progressing from predicting just category labels to providing
detailed descriptions of object instances. This motivates the need for large datasets which …

被引用次数：47 相关文章所有 5 个版本

[PDF] thecvf.com

Osprey: Pixel understanding with visual instruction tuning

Y Yuan, W Li, J Liu, D Tang, X Luo… - Proceedings of the …, 2024 - openaccess.thecvf.com

Multimodal large language models (MLLMs) have recently achieved impressive general-
purpose vision-language capabilities through visual instruction tuning. However current …

被引用次数：20 相关文章所有 4 个版本

[PDF] thecvf.com

Going denser with open-vocabulary part segmentation

P Sun, S Chen, C Zhu, F Xiao, P Luo… - Proceedings of the …, 2023 - openaccess.thecvf.com

Object detection has been expanded from a limited number of categories to open
vocabulary. Moving forward, a complete intelligent vision system requires understanding …

被引用次数：18 相关文章所有 6 个版本

[PDF] thecvf.com

Pip-net: Patch-based intuitive prototypes for interpretable image classification

M Nauta, J Schlötterer… - Proceedings of the …, 2023 - openaccess.thecvf.com

Interpretable methods based on prototypical patches recognize various components in an
image in order to explain their reasoning to humans. However, existing prototype-based …

被引用次数：31 相关文章所有 6 个版本

[PDF] arxiv.org

Dataset pruning: Reducing training data by examining generalization influence

S Yang, Z Xie, H Peng, M Xu, M Sun, P Li - arXiv preprint arXiv …, 2022 - arxiv.org

The great success of deep learning heavily relies on increasingly larger training data, which
comes at a price of huge computational and infrastructural costs. This poses crucial …

被引用次数：60 相关文章所有 4 个版本

[PDF] thecvf.com

Animal3d: A comprehensive dataset of 3d animal pose and shape

J Xu, Y Zhang, J Peng, W Ma… - Proceedings of the …, 2023 - openaccess.thecvf.com

Accurately estimating the 3D pose and shape is an essential step towards understanding
animal behavior, and can potentially benefit many downstream applications, such as wildlife …

被引用次数：9 相关文章所有 7 个版本

高级搜索

QQ 群