Tracking with human-intent reasoning

J Yu, H Xiong, L Zhang, H Diao, Y Zhuge… - arXiv preprint arXiv …, 2024 - arxiv.org

Multimodal Large Language Models (MLLMs) have gained significant attention due to their
impressive capabilities in multimodal understanding. However, existing methods rely heavily …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

HyperSeg: Towards Universal Visual Segmentation with Large Language Model

C Wei, Y Zhong, H Tan, Y Liu, Z Zhao, J Hu… - arXiv preprint arXiv …, 2024 - arxiv.org

This paper aims to address universal segmentation for image and video perception with the
strong reasoning ability empowered by Visual Large Language Models (VLLMs). Despite …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

One token to seg them all: Language instructed reasoning segmentation in videos

Z Bai, T He, H Mei, P Wang, Z Gao, J Chen… - arXiv preprint arXiv …, 2024 - arxiv.org

We introduce VideoLISA, a video-based multimodal large language model designed to
tackle the problem of language-instructed reasoning segmentation in videos. Leveraging the …

被引用次数：6 相关文章所有 4 个版本

HTACPE: A Hybrid Transformer with Adaptive Content and Position Embedding for Sample Learning Efficiency of Hyperspectral Tracker

Y Wang, S Mei, M Ma, Y Liu, Y Su - IEEE Transactions on …, 2025 - ieeexplore.ieee.org

Transformer architecture has demonstrated significant potential in hyperspectral object
tracking by leveraging global correlation learning to accurately represent the data …

[PDF] arxiv.org

[PDF] openreview.net

Temporal Prompting Matters: Rethinking Referring Video Object Segmentation

CS Lin, MH Chen, IJ Liu, CY Wang, S Liu, YCF Wang - openreview.net

Referring Video Object Segmentation (RVOS) aims to segment the object referred to by the
query sentence in the video. Most existing methods require end-to-end training with dense …

高级搜索

QQ 群