Visa: Reasoning video object segmentation via large language models

C Yan, H Wang, S Yan, X Jiang, Y Hu, G Kang… - arXiv preprint arXiv …, 2024 - arxiv.org
Existing Video Object Segmentation (VOS) relies on explicit user instructions, such as
categories, masks, or short phrases, restricting their ability to perform complex video …

Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation

S Yu, PH Seo, J Son - arXiv preprint arXiv:2407.07412, 2024 - arxiv.org
We propose a new framework that automatically generates high-quality segmentation masks
with their referring expressions as pseudo supervisions for referring image segmentation …

SpeechEE: A Novel Benchmark for Speech Event Extraction

B Wang, M Zhang, H Fei, Y Zhao, B Li, S Wu… - arXiv preprint arXiv …, 2024 - arxiv.org
Event extraction (EE) is a critical direction in the field of information extraction, laying an
important foundation for the construction of structured knowledge bases. EE from text has …