Bridging the gap: A unified video comprehension framework for moment retrieval and highlight detection

Y Xiao, Z Luo, Y Liu, Y Ma, H Bian… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Video Moment Retrieval (MR) and Highlight Detection (HD) have attracted
significant attention due to the growing demand for video analysis. Recent approaches treat …

Open-vocabulary segmentation with semantic-assisted calibration

Y Liu, S Bai, G Li, Y Wang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
This paper studies open-vocabulary segmentation (OVS) through calibrating in-vocabulary
and domain-biased embedding space with generalized contextual prior of CLIP. As the core …

Universal segmentation at arbitrary granularity with language instruction

Y Liu, C Zhang, Y Wang, J Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com
This paper aims to achieve universal segmentation of arbitrary semantic level. Despite
significant progress in recent years specialist segmentation approaches are limited to …

Decoupling static and hierarchical motion perception for referring video segmentation

S He, H Ding - Proceedings of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Referring video segmentation relies on natural language expressions to identify and
segment objects often emphasizing motion clues. Previous works treat a sentence as a …

Towards noise-tolerant speech-referring video object segmentation: Bridging speech and text

X Li, J Wang, X Xu, M Yang, F Yang… - Proceedings of the …, 2023 - aclanthology.org
Linguistic communication is prevalent in Human-Computer Interaction (HCI). Speech
(spoken language) serves as a convenient yet potentially ambiguous form due to noise and …

Losh: Long-short text joint prediction network for referring video object segmentation

L Yuan, M Shi, Z Yue, Q Chen - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Referring video object segmentation (RVOS) aims to segment the target instance referred by
a given text expression in a video clip. The text expression normally contains sophisticated …

1st Place Solution for 5th LSVOS Challenge: Referring Video Object Segmentation

Z Luo, Y Xiao, Y Liu, Y Wang, Y Tang, X Li… - arXiv preprint arXiv …, 2024 - arxiv.org
The recent transformer-based models have dominated the Referring Video Object
Segmentation (RVOS) task due to the superior performance. Most prior works adopt unified …

Video Object Segmentation Using Multi-Scale Attention-Based Siamese Network

Z Zhu, L Qiu, J Wang, J Xiong, H Peng - Electronics, 2023 - mdpi.com
Video target segmentation is a fundamental problem in computer vision that aims to
segment targets from a background by learning their appearance information and movement …

Efficient prompt tuning of large vision-language model for fine-grained ship classification

L Lan, F Wang, S Li, X Zheng, Z Wang, X Liu - arXiv preprint arXiv …, 2024 - arxiv.org
Fine-grained ship classification in remote sensing (RS-FGSC) poses a significant challenge
due to the high similarity between classes and the limited availability of labeled data, limiting …

Towards Temporally Consistent Referring Video Object Segmentation

B Miao, M Bennamoun, Y Gao, M Shah… - arXiv preprint arXiv …, 2024 - arxiv.org
Referring Video Object Segmentation (R-VOS) methods face challenges in maintaining
consistent object segmentation due to temporal context variability and the presence of other …