[HTML][HTML] Gpt-4 enhanced multimodal grounding for autonomous driving: Leveraging cross-modal attention with large language models

H Liao, H Shen, Z Li, C Wang, G Li, Y Bie… - … in Transportation Research, 2024 - Elsevier
In the field of autonomous vehicles (AVs), accurately discerning commander intent and
executing linguistic commands within a visual context presents a significant challenge. This …

[HTML][HTML] Deep learning-based natural language processing in human-agent interaction: Applications, advancements and challenges

N Ahmed, AK Saha, MA Al Noman, JR Jim… - Natural Language …, 2024 - Elsevier
Abstract Human-Agent Interaction is at the forefront of rapid development, with integrating
deep learning techniques into natural language processing representing significant …

Ground then navigate: Language-guided navigation in dynamic scenes

K Jain, V Chhangani, A Tiwari… - … on Robotics and …, 2023 - ieeexplore.ieee.org
We investigate the Vision-and-Language Navigation (VLN) problem in the context of
autonomous driving in outdoor settings. We solve the problem by explicitly grounding the …

Area-keywords cross-modal alignment for referring image segmentation

H Zhang, L Wang, S Li, K Xu, B Yin - Neurocomputing, 2024 - Elsevier
Referring image segmentation aims to segment the instance corresponding to the given
language description, which requires aligning information from two modalities. Existing …

Multimodal Target Localization with Landmark-Aware Positioning for Urban Mobility

N Hosomi, Y Iioka, S Hatanaka, T Misu… - IEEE Robotics and …, 2024 - ieeexplore.ieee.org
Advancements in vehicle automation technology are expected to significantly impact how
humans interact with vehicles. In this study, we propose a method to create user-friendly …

DLO Perceiver: Grounding Large Language Model for Deformable Linear Objects Perception

A Caporali, K Galassi, G Palli - IEEE Robotics and Automation …, 2024 - ieeexplore.ieee.org
The perception of Deformable Linear Objects (DLOs) is a challenging task due to their
complex and ambiguous appearance, lack of discernible features, typically small sizes, and …

Learning Autonomous Driving Tasks via Human Feedbacks with Large Language Models

Y Ma, X Cao, W Ye, C Cui, K Mei… - Findings of the …, 2024 - aclanthology.org
Traditional autonomous driving systems have mainly focused on making driving decisions
without human interaction, overlooking human-like decision-making and human preference …

Trimodal Navigable Region Segmentation Model: Grounding Navigation Instructions in Urban Areas

N Hosomi, S Hatanaka, Y Iioka, W Yang… - IEEE Robotics and …, 2024 - ieeexplore.ieee.org
In this study, we develop a model that enables mobilities to have more friendly interactions
with users. Specifically, we focus on the referring navigable regions task in which a model …

LASMP: Language Aided Subset Sampling Based Motion Planner

S Bhattacharjee, A Sinha, C Ekenna - arXiv preprint arXiv:2410.00649, 2024 - arxiv.org
This paper presents the Language Aided Subset Sampling Based Motion Planner (LASMP),
a system that helps mobile robots plan their movements by using natural language …

LeGo-Drive: Language-enhanced Goal-oriented Closed-Loop End-to-End Autonomous Driving

P Paul, A Garg, T Choudhary, AK Singh… - arXiv preprint arXiv …, 2024 - arxiv.org
Existing Vision-Language models (VLMs) estimate either long-term trajectory waypoints or a
set of control actions as a reactive solution for closed-loop planning based on their rich …