Grounding linguistic commands to navigable regions

[HTML][HTML] Gpt-4 enhanced multimodal grounding for autonomous driving: Leveraging cross-modal attention with large language models

H Liao, H Shen, Z Li, C Wang, G Li, Y Bie… - … in Transportation Research, 2024 - Elsevier

In the field of autonomous vehicles (AVs), accurately discerning commander intent and
executing linguistic commands within a visual context presents a significant challenge. This …

被引用次数：25 相关文章所有 4 个版本

[PDF] arxiv.org

Ground then navigate: Language-guided navigation in dynamic scenes

K Jain, V Chhangani, A Tiwari… - … on Robotics and …, 2023 - ieeexplore.ieee.org

We investigate the Vision-and-Language Navigation (VLN) problem in the context of
autonomous driving in outdoor settings. We solve the problem by explicitly grounding the …

被引用次数：19 相关文章所有 3 个版本

Area-keywords cross-modal alignment for referring image segmentation

H Zhang, L Wang, S Li, K Xu, B Yin - Neurocomputing, 2024 - Elsevier

Referring image segmentation aims to segment the instance corresponding to the given
language description, which requires aligning information from two modalities. Existing …

被引用次数：1 相关文章

Trimodal Navigable Region Segmentation Model: Grounding Navigation Instructions in Urban Areas

N Hosomi, S Hatanaka, Y Iioka, W Yang… - IEEE Robotics and …, 2024 - ieeexplore.ieee.org

In this study, we develop a model that enables mobilities to have more friendly interactions
with users. Specifically, we focus on the referring navigable regions task in which a model …

[PDF] arxiv.org

LeGo-Drive: Language-enhanced Goal-oriented Closed-Loop End-to-End Autonomous Driving

P Paul, A Garg, T Choudhary, AK Singh… - arXiv preprint arXiv …, 2024 - arxiv.org

Existing Vision-Language models (VLMs) estimate either long-term trajectory waypoints or a
set of control actions as a reactive solution for closed-loop planning based on their rich …

Estimation of Appearance and Occupancy Information in Birds Eye View from Surround Monocular Images

S Sharma, UR Nair, US Parihar… - arXiv preprint arXiv …, 2022 - arxiv.org

Autonomous driving requires efficient reasoning about the location and appearance of the
different agents in the scene, which aids in downstream tasks such as object detection …

市街地での移動指示理解タスクにおけるUNITER Regressor による目標位置予測

畑中駿平，細見直希，翠輝久，山田健太郎… - … 研究会126 回(2022/08), 2022 - jstage.jst.go.jp

Recent advancement of vehicle automation technology is expected to improve the
interaction between human and mobility modes. As the promising means, language …

被引用次数：1 相关文章所有 4 个版本

[PDF] jst.go.jp

市街地での移動指示文に基づく目標領域予測

畑中駿平，楊巍，九曜克之，細見直希… - … 学会全国大会論文集第 …, 2023 - jstage.jst.go.jp

抄録本研究では, 画像, ナビゲーション指示, セマンティックセグメンテーションマスクの 3
つのモダリティを扱うことができる Trimodal Navigable Region Segmentation Model (TNRSM) …

高级搜索

QQ 群