Large language models (LLMs) have notably accelerated progress towards artificial general intelligence (AGI), with their impressive zero-shot capacity for user-tailored tasks, endowing …
Large language models have shown their remarkable capabilities as a general interface for various language-related applications. Motivated by this, we target to build a unified …
Neural compression is the application of neural networks and other machine learning methods to data compression. Recent advances in statistical machine learning have opened …
Training data for video segmentation are expensive to annotate. This impedes extensions of end-to-end algorithms to new video segmentation tasks, especially in large-vocabulary …
We present Unified-IO 2 a multimodal and multi-skill unified model capable of following novel instructions. Unified-IO 2 can use text images audio and/or videos as input and can …
In this work we address various segmentation tasks each traditionally tackled by distinct or partially unified models. We propose OMG-Seg One Model that is Good enough to efficiently …
We present Cutie a video object segmentation (VOS) network with object-level memory reading which puts the object representation from memory back into the video object …
Deriving reliable region-word alignment from image-text pairs is critical to learnobject-level vision-language representations for open-vocabulary object detection. Existing methods …
Open-vocabulary image segmentation aims to partition an image into semantic regions according to arbitrary text descriptions. However, complex visual scenes can be naturally …