Spectrum-guided multi-granularity referring video object segmentation

B Miao, M Bennamoun, Y Gao… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Current referring video object segmentation (R-VOS) techniques extract conditional kernels
from encoded (low-resolution) vision-language features to segment the decoded high …

Segment any anomaly without training via hybrid prompt regularization

Y Cao, X Xu, C Sun, Y Cheng, Z Du, L Gao… - arXiv preprint arXiv …, 2023 - arxiv.org
We present a novel framework, ie, Segment Any Anomaly+(SAA+), for zero-shot anomaly
segmentation with hybrid prompt regularization to improve the adaptability of modern …

Univs: Unified and universal video segmentation with prompts as queries

M Li, S Li, X Zhang, L Zhang - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com
Despite the recent advances in unified image segmentation (IS) developing a unified video
segmentation (VS) model remains a challenge. This is mainly because generic category …

Described object detection: Liberating object detection with flexible expressions

C Xie, Z Zhang, Y Wu, F Zhu… - Advances in Neural …, 2024 - proceedings.neurips.cc
Detecting objects based on language information is a popular task that includes Open-
Vocabulary object Detection (OVD) and Referring Expression Comprehension (REC). In this …

Paintseg: Painting pixels for training-free segmentation

X Li, CC Lin, Y Chen, Z Liu, J Wang… - Advances in Neural …, 2024 - proceedings.neurips.cc
The paper introduces PaintSeg, a new unsupervised method for segmenting objects without
any training. We propose an adversarial masked contrastive painting (AMCP) process …

Towards robust referring image segmentation

J Wu, X Li, X Li, H Ding, Y Tong… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Referring Image Segmentation (RIS) is a fundamental vision-language task that outputs
object masks based on text descriptions. Many works have achieved considerable progress …

Decoupling static and hierarchical motion perception for referring video segmentation

S He, H Ding - Proceedings of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Referring video segmentation relies on natural language expressions to identify and
segment objects often emphasizing motion clues. Previous works treat a sentence as a …

Omg-llava: Bridging image-level, object-level, pixel-level reasoning and understanding

T Zhang, X Li, H Fei, H Yuan, S Wu, S Ji… - arXiv preprint arXiv …, 2024 - arxiv.org
Current universal segmentation methods demonstrate strong capabilities in pixel-level
image and video understanding. However, they lack reasoning abilities and cannot be …

Towards noise-tolerant speech-referring video object segmentation: Bridging speech and text

X Li, J Wang, X Xu, M Yang, F Yang… - Proceedings of the …, 2023 - aclanthology.org
Linguistic communication is prevalent in Human-Computer Interaction (HCI). Speech
(spoken language) serves as a convenient yet potentially ambiguous form due to noise and …

Learning cross-modal affinity for referring video object segmentation targeting limited samples

G Li, M Gao, H Liu, X Zhen… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Referring video object segmentation (RVOS), as a supervised learning task, relies on
sufficient annotated data for a given scene. However, in more realistic scenarios, only …