Llms can evolve continually on modality for x-modal reasoning

J Yu, H Xiong, L Zhang, H Diao, Y Zhuge… - arXiv preprint arXiv …, 2024 - arxiv.org
Multimodal Large Language Models (MLLMs) have gained significant attention due to their
impressive capabilities in multimodal understanding. However, existing methods rely heavily …

HyperSeg: Towards Universal Visual Segmentation with Large Language Model

C Wei, Y Zhong, H Tan, Y Liu, Z Zhao, J Hu… - arXiv preprint arXiv …, 2024 - arxiv.org
This paper aims to address universal segmentation for image and video perception with the
strong reasoning ability empowered by Visual Large Language Models (VLLMs). Despite …

One token to seg them all: Language instructed reasoning segmentation in videos

Z Bai, T He, H Mei, P Wang, Z Gao, J Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce VideoLISA, a video-based multimodal large language model designed to
tackle the problem of language-instructed reasoning segmentation in videos. Leveraging the …

HTACPE: A Hybrid Transformer with Adaptive Content and Position Embedding for Sample Learning Efficiency of Hyperspectral Tracker

Y Wang, S Mei, M Ma, Y Liu, Y Su - IEEE Transactions on …, 2025 - ieeexplore.ieee.org
Transformer architecture has demonstrated significant potential in hyperspectral object
tracking by leveraging global correlation learning to accurately represent the data …

DyRoNet: A Low-Rank Adapter Enhanced Dynamic Routing Network for Streaming Perception

X Huang, ZQ Cheng, JY He, C Li, W Xiang… - arXiv preprint arXiv …, 2024 - arxiv.org
Autonomous driving systems demand real-time, accurate perception to navigate complex
environments. Addressing this, we introduce the Dynamic Router Network (DyRoNet), a …

[PDF][PDF] DyRoNet: Dynamic Routing and Low-Rank Adapters for Autonomous Driving Streaming Perception

X Huang, ZQ Cheng, JY He, C Li, W Xiang, B Sun… - CoRR, 2024 - researchgate.net
The advancement of autonomous driving systems hinges on the ability to achieve low-
latency and high-accuracy perception. To address this critical need, this paper introduces …

LLMs Can Evolve Continually on Modality for -Modal Reasoning

J Yu, H Xiong, L Zhang, H Diao, Y Zhuge… - The Thirty-eighth Annual … - openreview.net
Multimodal Large Language Models (MLLMs) have gained significant attention due to their
impressive capabilities in multimodal understanding. However, existing methods rely heavily …

Temporal Prompting Matters: Rethinking Referring Video Object Segmentation

CS Lin, MH Chen, IJ Liu, CY Wang, S Liu, YCF Wang - openreview.net
Referring Video Object Segmentation (RVOS) aims to segment the object referred to by the
query sentence in the video. Most existing methods require end-to-end training with dense …