C Wei, Y Zhong, H Tan, Y Liu, Z Zhao, J Hu… - arXiv preprint arXiv …, 2024 - arxiv.org
This paper aims to address universal segmentation for image and video perception with the strong reasoning ability empowered by Visual Large Language Models (VLLMs). Despite …
We introduce VideoLISA, a video-based multimodal large language model designed to tackle the problem of language-instructed reasoning segmentation in videos. Leveraging the …
Y Wang, S Mei, M Ma, Y Liu, Y Su - IEEE Transactions on …, 2025 - ieeexplore.ieee.org
Transformer architecture has demonstrated significant potential in hyperspectral object tracking by leveraging global correlation learning to accurately represent the data …
X Huang, ZQ Cheng, JY He, C Li, W Xiang… - arXiv preprint arXiv …, 2024 - arxiv.org
Autonomous driving systems demand real-time, accurate perception to navigate complex environments. Addressing this, we introduce the Dynamic Router Network (DyRoNet), a …
The advancement of autonomous driving systems hinges on the ability to achieve low- latency and high-accuracy perception. To address this critical need, this paper introduces …
J Yu, H Xiong, L Zhang, H Diao, Y Zhuge… - The Thirty-eighth Annual … - openreview.net
Multimodal Large Language Models (MLLMs) have gained significant attention due to their impressive capabilities in multimodal understanding. However, existing methods rely heavily …
CS Lin, MH Chen, IJ Liu, CY Wang, S Liu, YCF Wang - openreview.net
Referring Video Object Segmentation (RVOS) aims to segment the object referred to by the query sentence in the video. Most existing methods require end-to-end training with dense …