查看文章

thecvf.com 中的 [PDF]

Robust Referring Video Object Segmentation with Cyclic Structural Consensus

作者

Xiang Li, Jinglu Wang, Xiaohao Xu, Xiao Li, Bhiksha Raj, Yan Lu

发表日期

2023

研讨会论文

Proceedings of the IEEE/CVF International Conference on Computer Vision

页码范围

22236-22245

简介

Referring Video Object Segmentation (R-VOS) is a challenging task that aims to segment an object in a video based on a linguistic expression. Most existing R-VOS methods have a critical assumption: the object referred to must appear in the video. This assumption, which we refer to as" semantic consensus", is often violated in real-world scenarios, where the expression may be queried against false videos. In this work, we highlight the need for a robust R-VOS model that can handle semantic mismatches. Accordingly, we propose an extended task called Robust R-VOS (RRVOS), which accepts unpaired video-text inputs. We tackle this problem by jointly modeling the primary R-VOS problem and its dual (text reconstruction). A structural text-to-text cycle constraint is introduced to discriminate semantic consensus between video-text pairs and impose it in positive pairs, thereby achieving multi-modal alignment from both positive and negative pairs. Our structural constraint effectively addresses the challenge posed by linguistic diversity, overcoming the limitations of previous methods that relied on the point-wise constraint. A new evaluation dataset, RRYTVOS is constructed to measure the model robustness. Our model achieves state-of-the-art performance on R-VOS benchmarks, Ref-DAVIS17 and Ref-Youtube-VOS, and also our RRYTVOS dataset.

引用总数

被引用次数：30

2022202320241 11 18

学术搜索中的文章

Robust referring video object segmentation with cyclic structural consensus

X Li, J Wang, X Xu, X Li, B Raj, Y Lu - Proceedings of the IEEE/CVF International Conference …, 2023

被引用次数：15 相关文章所有 3 个版本

R^ 2VOS: Robust Referring Video Object Segmentation via Relational Multimodal Cycle Consistency*

X Li, J Wang, X Xu, X Li, Y Lu, B Raj - arXiv preprint arXiv:2207.01203, 2022

被引用次数：15 相关文章所有 2 个版本

R^ 2-VOS: Robust Referring Video Object Segmentation via Relational Cycle Consistency*

X Li, J Wang, X Xu, X Li, Y Lu, B Raj - 2022