Segment everything everywhere all at once

X Zou, J Yang, H Zhang, F Li, L Li… - Advances in …, 2024 - proceedings.neurips.cc
In this work, we present SEEM, a promotable and interactive model for segmenting
everything everywhere all at once in an image. In SEEM, we propose a novel and versatile …

Visionllm: Large language model is also an open-ended decoder for vision-centric tasks

W Wang, Z Chen, X Chen, J Wu… - Advances in …, 2024 - proceedings.neurips.cc
Large language models (LLMs) have notably accelerated progress towards artificial general
intelligence (AGI), with their impressive zero-shot capacity for user-tailored tasks, endowing …

Minigpt-v2: large language model as a unified interface for vision-language multi-task learning

J Chen, D Zhu, X Shen, X Li, Z Liu, P Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models have shown their remarkable capabilities as a general interface for
various language-related applications. Motivated by this, we target to build a unified …

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com
Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

Tracking anything with decoupled video segmentation

HK Cheng, SW Oh, B Price… - Proceedings of the …, 2023 - openaccess.thecvf.com
Training data for video segmentation are expensive to annotate. This impedes extensions of
end-to-end algorithms to new video segmentation tasks, especially in large-vocabulary …

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action

J Lu, C Clark, S Lee, Z Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com
We present Unified-IO 2 a multimodal and multi-skill unified model capable of following
novel instructions. Unified-IO 2 can use text images audio and/or videos as input and can …

OMG-Seg: Is one model good enough for all segmentation?

X Li, H Yuan, W Li, H Ding, S Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com
In this work we address various segmentation tasks each traditionally tackled by distinct or
partially unified models. We propose OMG-Seg One Model that is Good enough to efficiently …

Putting the object back into video object segmentation

HK Cheng, SW Oh, B Price, JY Lee… - Proceedings of the …, 2024 - openaccess.thecvf.com
We present Cutie a video object segmentation (VOS) network with object-level memory
reading which puts the object representation from memory back into the video object …

Codet: Co-occurrence guided region-word alignment for open-vocabulary object detection

C Ma, Y Jiang, X Wen, Z Yuan… - Advances in neural …, 2024 - proceedings.neurips.cc
Deriving reliable region-word alignment from image-text pairs is critical to learnobject-level
vision-language representations for open-vocabulary object detection. Existing methods …

Hierarchical open-vocabulary universal image segmentation

X Wang, S Li, K Kallidromitis, Y Kato… - Advances in …, 2024 - proceedings.neurips.cc
Open-vocabulary image segmentation aims to partition an image into semantic regions
according to arbitrary text descriptions. However, complex visual scenes can be naturally …