Universal instance perception as object discovery and retrieval

X Zou, J Yang, H Zhang, F Li, L Li… - Advances in …, 2024 - proceedings.neurips.cc

In this work, we present SEEM, a promotable and interactive model for segmenting
everything everywhere all at once in an image. In SEEM, we propose a novel and versatile …

被引用次数：310 相关文章所有 5 个版本

[PDF] neurips.cc

Visionllm: Large language model is also an open-ended decoder for vision-centric tasks

W Wang, Z Chen, X Chen, J Wu… - Advances in …, 2024 - proceedings.neurips.cc

Large language models (LLMs) have notably accelerated progress towards artificial general
intelligence (AGI), with their impressive zero-shot capacity for user-tailored tasks, endowing …

被引用次数：248 相关文章所有 6 个版本

[PDF] arxiv.org

Minigpt-v2: large language model as a unified interface for vision-language multi-task learning

J Chen, D Zhu, X Shen, X Li, Z Liu, P Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models have shown their remarkable capabilities as a general interface for
various language-related applications. Motivated by this, we target to build a unified …

被引用次数：244 相关文章所有 6 个版本

[PDF] nowpublishers.com

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com

Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

被引用次数：104 相关文章所有 6 个版本

[PDF] thecvf.com

Tracking anything with decoupled video segmentation

HK Cheng, SW Oh, B Price… - Proceedings of the …, 2023 - openaccess.thecvf.com

Training data for video segmentation are expensive to annotate. This impedes extensions of
end-to-end algorithms to new video segmentation tasks, especially in large-vocabulary …

被引用次数：58 相关文章所有 7 个版本

[PDF] thecvf.com

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action

J Lu, C Clark, S Lee, Z Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com

We present Unified-IO 2 a multimodal and multi-skill unified model capable of following
novel instructions. Unified-IO 2 can use text images audio and/or videos as input and can …

被引用次数：37 相关文章所有 3 个版本

[PDF] thecvf.com

OMG-Seg: Is one model good enough for all segmentation?

X Li, H Yuan, W Li, H Ding, S Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com

In this work we address various segmentation tasks each traditionally tackled by distinct or
partially unified models. We propose OMG-Seg One Model that is Good enough to efficiently …

被引用次数：22 相关文章所有 3 个版本

[PDF] thecvf.com

Putting the object back into video object segmentation

HK Cheng, SW Oh, B Price, JY Lee… - Proceedings of the …, 2024 - openaccess.thecvf.com

We present Cutie a video object segmentation (VOS) network with object-level memory
reading which puts the object representation from memory back into the video object …

被引用次数：25 相关文章所有 4 个版本

[PDF] neurips.cc

Codet: Co-occurrence guided region-word alignment for open-vocabulary object detection

C Ma, Y Jiang, X Wen, Z Yuan… - Advances in neural …, 2024 - proceedings.neurips.cc

Deriving reliable region-word alignment from image-text pairs is critical to learnobject-level
vision-language representations for open-vocabulary object detection. Existing methods …

被引用次数：22 相关文章所有 6 个版本

[PDF] neurips.cc

Hierarchical open-vocabulary universal image segmentation

X Wang, S Li, K Kallidromitis, Y Kato… - Advances in …, 2024 - proceedings.neurips.cc

Open-vocabulary image segmentation aims to partition an image into semantic regions
according to arbitrary text descriptions. However, complex visual scenes can be naturally …

被引用次数：19 相关文章所有 5 个版本

高级搜索

QQ 群