Polyformer: Referring image segmentation as sequential polygon generation

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com

Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

被引用次数：198 相关文章所有 6 个版本

[PDF] arxiv.org

Contextual object detection with multimodal large language models

Y Zang, W Li, J Han, K Zhou, CC Loy - International Journal of Computer …, 2024 - Springer

Abstract Recent Multimodal Large Language Models (MLLMs) are remarkable in vision-
language tasks, such as image captioning and question answering, but lack the essential …

被引用次数：68 相关文章所有 2 个版本

[PDF] neurips.cc

Hierarchical open-vocabulary universal image segmentation

X Wang, S Li, K Kallidromitis, Y Kato… - Advances in …, 2024 - proceedings.neurips.cc

Open-vocabulary image segmentation aims to partition an image into semantic regions
according to arbitrary text descriptions. However, complex visual scenes can be naturally …

被引用次数：32 相关文章所有 5 个版本

[PDF] thecvf.com

Florence-2: Advancing a unified representation for a variety of vision tasks

B Xiao, H Wu, W Xu, X Dai, H Hu, Y Lu… - Proceedings of the …, 2024 - openaccess.thecvf.com

We introduce Florence-2 a novel vision foundation model with a unified prompt-based
representation for various computer vision and vision-language tasks. While existing large …

被引用次数：77 相关文章所有 3 个版本

[PDF] thecvf.com

General object foundation model for images and videos at scale

J Wu, Y Jiang, Q Liu, Z Yuan… - Proceedings of the …, 2024 - openaccess.thecvf.com

We present GLEE in this work an object-level foundation model for locating and identifying
objects in images and videos. Through a unified framework GLEEaccomplishes detection …

被引用次数：34 相关文章所有 3 个版本

[PDF] arxiv.org

Pali-3 vision language models: Smaller, faster, stronger

X Chen, X Wang, L Beyer, A Kolesnikov, J Wu… - arXiv preprint arXiv …, 2023 - arxiv.org

This paper presents PaLI-3, a smaller, faster, and stronger vision language model (VLM) that
compares favorably to similar models that are 10x larger. As part of arriving at this strong …

被引用次数：69 相关文章所有 3 个版本

[PDF] thecvf.com

Zero-shot referring image segmentation with global-local context features

S Yu, PH Seo, J Son - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com

Referring image segmentation (RIS) aims to find a segmentation mask given a referring
expression grounded to a region of the input image. Collecting labelled datasets for this …

被引用次数：53 相关文章所有 7 个版本

[PDF] arxiv.org

Remamber: Referring image segmentation with mamba twister

Y Yang, C Ma, J Yao, Z Zhong, Y Zhang… - European Conference on …, 2025 - Springer

Abstract Referring Image Segmentation (RIS) leveraging transformers has achieved great
success on the interpretation of complex visual-language tasks. However, the quadratic …

被引用次数：15 相关文章所有 2 个版本

[PDF] thecvf.com

Gsva: Generalized segmentation via multimodal large language models

Z Xia, D Han, Y Han, X Pan, S Song… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Generalized Referring Expression Segmentation (GRES) extends the scope of
classic RES to refer to multiple objects in one expression or identify the empty targets absent …

被引用次数：32 相关文章所有 3 个版本

[PDF] thecvf.com

Segment every reference object in spatial and temporal spaces

J Wu, Y Jiang, B Yan, H Lu… - Proceedings of the …, 2023 - openaccess.thecvf.com

The reference-based object segmentation tasks, namely referring image segmentation
(RIS), referring video object segmentation (RVOS), and video object segmentation (VOS) …

被引用次数：11 相关文章所有 3 个版本

高级搜索

QQ 群