MeViS: A large-scale benchmark for video segmentation with motion expressions

H Ding, C Liu, S He, X Jiang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
This paper strives for motion expressions guided video segmentation, which focuses on
segmenting objects in video content based on a sentence describing the motion of the …

Lavt: Language-aware vision transformer for referring image segmentation

Z Yang, J Wang, Y Tang, K Chen… - Proceedings of the …, 2022 - openaccess.thecvf.com
Referring image segmentation is a fundamental vision-language task that aims to segment
out an object referred to by a natural language expression from an image. One of the key …

Visual semantic segmentation based on few/zero-shot learning: An overview

W Ren, Y Tang, Q Sun, C Zhao… - IEEE/CAA Journal of …, 2023 - ieeexplore.ieee.org
Visual semantic segmentation aims at separating a visual sample into diverse blocks with
specific semantic attributes and identifying the category for each block, and it plays a crucial …

Matting anything

J Li, J Jain, H Shi - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
In this paper we propose the Matting Anything Model (MAM) an efficient and versatile
framework for estimating the alpha matte of any instance in an image with flexible and …

Language as queries for referring video object segmentation

J Wu, Y Jiang, P Sun, Z Yuan… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Referring video object segmentation (R-VOS) is an emerging cross-modal task that aims to
segment the target object referred by a language expression in all video frames. In this work …

End-to-end referring video object segmentation with multimodal transformers

A Botach, E Zheltonozhskii… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
The referring video object segmentation task (RVOS) involves segmentation of a text-
referred object instance in the frames of a given video. Due to the complex nature of this …

Spectrum-guided multi-granularity referring video object segmentation

B Miao, M Bennamoun, Y Gao… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Current referring video object segmentation (R-VOS) techniques extract conditional kernels
from encoded (low-resolution) vision-language features to segment the decoded high …

Multi-level representation learning with semantic alignment for referring video object segmentation

D Wu, X Dong, L Shao, J Shen - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Referring video object segmentation (RVOS) is a challenging language-guided video
grounding task, which requires comprehensively understanding the semantic information of …

You only infer once: Cross-modal meta-transfer for referring video object segmentation

D Li, R Li, L Wang, Y Wang, J Qi, L Zhang… - Proceedings of the …, 2022 - ojs.aaai.org
Abstract We present YOFO (You Only inFer Once), a new paradigm for referring video object
segmentation (RVOS) that operates in an one-stage manner. Our key insight is that the …

Decoupling static and hierarchical motion perception for referring video segmentation

S He, H Ding - Proceedings of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Referring video segmentation relies on natural language expressions to identify and
segment objects often emphasizing motion clues. Previous works treat a sentence as a …