Z Zhang, Y Ma, E Zhang, X Bai - arXiv preprint arXiv:2403.14598, 2024 - arxiv.org
PSALM is a powerful extension of the Large Multi-modal Model (LMM) to address the segmentation task challenges. To overcome the limitation of the LMM being limited to textual …
J Wu, M Zhong, S Xing, Z Lai, Z Liu, W Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
We present VisionLLM v2, an end-to-end generalist multimodal large model (MLLM) that unifies visual perception, understanding, and generation within a single framework. Unlike …
Using instance segmentation and video inpainting provides a significant leap in real-time football video broadcast enhancements by removing potential visual distractions, such as an …
H Xu, X Zhang, J He, Z Geng, Y Yu… - IEEE Sensors …, 2024 - ieeexplore.ieee.org
In recent years, the significance of unmanned surface vehicles (USVs) has grown substantially across a wide range of applications. Monocular cameras, as the most common …
A vision model with general-purpose object-level 3D understanding should be capable of inferring both 2D (eg, class name and bounding box) and 3D information (eg, 3D location …
Motion Expression guided Video Segmentation (MeViS), as an emerging task, poses many new challenges to the field of referring video object segmentation (RVOS). In this technical …
Video Foundation Models (ViFMs) aim to learn a general-purpose representation for various video understanding tasks. Leveraging large-scale datasets and powerful models, ViFMs …
P Liao, F Yang, D Wu, L Bo - arXiv preprint arXiv:2405.15176, 2024 - arxiv.org
Monocular vision-based 3D object detection is crucial in various sectors, yet existing methods face significant challenges in terms of accuracy and computational efficiency …