[HTML][HTML] Gpt-4 enhanced multimodal grounding for autonomous driving: Leveraging cross-modal attention with large language models

H Liao, H Shen, Z Li, C Wang, G Li, Y Bie… - … in Transportation Research, 2024 - Elsevier
In the field of autonomous vehicles (AVs), accurately discerning commander intent and
executing linguistic commands within a visual context presents a significant challenge. This …

Ground then navigate: Language-guided navigation in dynamic scenes

K Jain, V Chhangani, A Tiwari… - … on Robotics and …, 2023 - ieeexplore.ieee.org
We investigate the Vision-and-Language Navigation (VLN) problem in the context of
autonomous driving in outdoor settings. We solve the problem by explicitly grounding the …

Area-keywords cross-modal alignment for referring image segmentation

H Zhang, L Wang, S Li, K Xu, B Yin - Neurocomputing, 2024 - Elsevier
Referring image segmentation aims to segment the instance corresponding to the given
language description, which requires aligning information from two modalities. Existing …

Trimodal Navigable Region Segmentation Model: Grounding Navigation Instructions in Urban Areas

N Hosomi, S Hatanaka, Y Iioka, W Yang… - IEEE Robotics and …, 2024 - ieeexplore.ieee.org
In this study, we develop a model that enables mobilities to have more friendly interactions
with users. Specifically, we focus on the referring navigable regions task in which a model …

LeGo-Drive: Language-enhanced Goal-oriented Closed-Loop End-to-End Autonomous Driving

P Paul, A Garg, T Choudhary, AK Singh… - arXiv preprint arXiv …, 2024 - arxiv.org
Existing Vision-Language models (VLMs) estimate either long-term trajectory waypoints or a
set of control actions as a reactive solution for closed-loop planning based on their rich …

Estimation of Appearance and Occupancy Information in Birds Eye View from Surround Monocular Images

S Sharma, UR Nair, US Parihar… - arXiv preprint arXiv …, 2022 - arxiv.org
Autonomous driving requires efficient reasoning about the location and appearance of the
different agents in the scene, which aids in downstream tasks such as object detection …

市街地での移動指示理解タスクにおけるUNITER Regressor による目標位置予測

畑中駿平, 細見直希, 翠輝久, 山田健太郎… - … 研究会126 回(2022/08), 2022 - jstage.jst.go.jp
Recent advancement of vehicle automation technology is expected to improve the
interaction between human and mobility modes. As the promising means, language …

市街地での移動指示文に基づく目標領域予測

畑中駿平, 楊巍, 九曜克之, 細見直希… - … 学会全国大会論文集第 …, 2023 - jstage.jst.go.jp
抄録 本研究では, 画像, ナビゲーション指示, セマンティックセグメンテーションマスクの 3
つのモダリティを扱うことができる Trimodal Navigable Region Segmentation Model (TNRSM) …