查看文章

WaterVG: Waterway Visual Grounding based on Text-Guided Vision and mmWave Radar

作者

Runwei Guan, Liye Jia, Fengyufan Yang, Shanliang Yao, Erick Purwanto, Xiaohui Zhu, Eng Gee Lim, Jeremy Smith, Ka Lok Man, Xuming Hu, Yutao Yue

发表日期

2024/3/19

期刊

arXiv preprint arXiv:2403.12686

简介

The perception of waterways based on human intent holds significant importance for autonomous navigation and operations of Unmanned Surface Vehicles (USVs) in water environments. Inspired by visual grounding, in this paper, we introduce WaterVG, the first visual grounding dataset designed for USV-based waterway perception based on human intention prompts. WaterVG encompasses prompts describing multiple targets, with annotations at the instance level including bounding boxes and masks. Notably, WaterVG includes 11,568 samples with 34,950 referred targets, which integrates both visual and radar characteristics captured by monocular camera and millimeter-wave (mmWave) radar, enabling a finer granularity of text prompts. Furthermore, we propose a novel multi-modal visual grounding model, Potamoi, which is a multi-modal and multi-task model based on the one-stage paradigm with a designed Phased Heterogeneous Modality Fusion (PHMF) structure, including Adaptive Radar Weighting (ARW) and Multi-Head Slim Cross Attention (MHSCA). In specific, MHSCA is a low-cost and efficient fusion module with a remarkably small parameter count and FLOPs, elegantly aligning and fusing scenario context information captured by two sensors with linguistic features, which can effectively address tasks of referring expression comprehension and segmentation based on fine-grained prompts. Comprehensive experiments and evaluations have been conducted on WaterVG, where our Potamoi archives state-of-the-art performances compared with counterparts.

引用总数

被引用次数：2

20242

学术搜索中的文章

Watervg: Waterway visual grounding based on text-guided vision and mmwave radar

R Guan, L Jia, F Yang, S Yao, E Purwanto, X Zhu… - arXiv preprint arXiv:2403.12686, 2024

被引用次数：2 相关文章所有 4 个版本