Vision-and-language navigation: A survey of tasks, methods, and future directions

J Gu, E Stefani, Q Wu, J Thomason… - arXiv preprint arXiv …, 2022 - arxiv.org
A long-term goal of AI research is to build intelligent agents that can communicate with
humans in natural language, perceive the environment, and perform real-world tasks. Vision …

Velma: Verbalization embodiment of llm agents for vision and language navigation in street view

R Schumann, W Zhu, W Feng, TJ Fu… - Proceedings of the …, 2024 - ojs.aaai.org
Incremental decision making in real-world environments is one of the most challenging tasks
in embodied artificial intelligence. One particularly demanding scenario is Vision and …

Envedit: Environment editing for vision-and-language navigation

J Li, H Tan, M Bansal - … of the IEEE/CVF Conference on …, 2022 - openaccess.thecvf.com
Abstract In Vision-and-Language Navigation (VLN), an agent needs to navigate through the
environment based on natural language instructions. Due to limited available data for agent …

ChatGPT vs human-authored text: Insights into controllable text summarization and sentence style transfer

D Pu, V Demberg - arXiv preprint arXiv:2306.07799, 2023 - arxiv.org
Large-scale language models, like ChatGPT, have garnered significant media attention and
stunned the public with their remarkable capacity for generating coherent text from short …

Pathdreamer: A world model for indoor navigation

JY Koh, H Lee, Y Yang, J Baldridge… - Proceedings of the …, 2021 - openaccess.thecvf.com
People navigating in unfamiliar buildings take advantage of myriad visual, spatial and
semantic cues to efficiently achieve their navigation goals. Towards equipping …

Vision-language navigation: a survey and taxonomy

W Wu, T Chang, X Li, Q Yin, Y Hu - Neural Computing and Applications, 2024 - Springer
Vision-language navigation (VLN) tasks require an agent to follow language instructions
from a human guide to navigate in previously unseen environments using visual …

Diagnosing vision-and-language navigation: What really matters

W Zhu, Y Qi, P Narayana, K Sone, S Basu… - arXiv preprint arXiv …, 2021 - arxiv.org
Vision-and-language navigation (VLN) is a multimodal task where an agent follows natural
language instructions and navigates in visual environments. Multiple setups have been …

Ground then navigate: Language-guided navigation in dynamic scenes

K Jain, V Chhangani, A Tiwari… - … on Robotics and …, 2023 - ieeexplore.ieee.org
We investigate the Vision-and-Language Navigation (VLN) problem in the context of
autonomous driving in outdoor settings. We solve the problem by explicitly grounding the …

Grounding and distinguishing conceptual vocabulary through similarity learning in embodied simulations

S Ghaffari, N Krishnaswamy - arXiv preprint arXiv:2305.13668, 2023 - arxiv.org
We present a novel method for using agent experiences gathered through an embodied
simulation to ground contextualized word vectors to object representations. We use similarity …

Loc4plan: Locating before planning for outdoor vision and language navigation

H Tian, J Meng, WS Zheng, YM Li, J Yan… - Proceedings of the 32nd …, 2024 - dl.acm.org
Vision and Language Navigation (VLN) is a challenging task that requires agents to
understand instructions and navigate to the destination in a visual environment. One of the …