Omg-llava: Bridging image-level, object-level, pixel-level reasoning and understanding

文章

学术资源搜索

获得 3 条结果（用时0.01秒）

我的图书馆

Omg-llava: Bridging image-level, object-level, pixel-level reasoning and understanding

在引用文章中搜索

[PDF] arxiv.org

Visa: Reasoning video object segmentation via large language models

C Yan, H Wang, S Yan, X Jiang, Y Hu, G Kang… - arXiv preprint arXiv …, 2024 - arxiv.org

Existing Video Object Segmentation (VOS) relies on explicit user instructions, such as
categories, masks, or short phrases, restricting their ability to perform complex video …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation

S Yu, PH Seo, J Son - arXiv preprint arXiv:2407.07412, 2024 - arxiv.org

We propose a new framework that automatically generates high-quality segmentation masks
with their referring expressions as pseudo supervisions for referring image segmentation …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

SpeechEE: A Novel Benchmark for Speech Event Extraction

B Wang, M Zhang, H Fei, Y Zhao, B Li, S Wu… - arXiv preprint arXiv …, 2024 - arxiv.org

Event extraction (EE) is a critical direction in the field of information extraction, laying an
important foundation for the construction of structured knowledge bases. EE from text has …

高级搜索

QQ 群

Omg-llava: Bridging image-level, object-level, pixel-level reasoning and understanding

Visa: Reasoning video object segmentation via large language models

Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation

SpeechEE: A Novel Benchmark for Speech Event Extraction

引用