Multimodal research in vision and language: A review of current and emerging trends

S Uppal, S Bhagat, D Hazarika, N Majumder, S Poria… - Information …, 2022 - Elsevier
Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …

Evolving graphical planner: Contextual global planning for vision-and-language navigation

Z Deng, K Narasimhan… - Advances in Neural …, 2020 - proceedings.neurips.cc
The ability to perform effective planning is crucial for building an instruction-following agent.
When navigating through a new environment, an agent is challenged with (1) connecting the …

Vision-and-language navigation today and tomorrow: A survey in the era of foundation models

Y Zhang, Z Ma, J Li, Y Qiao, Z Wang, J Chai… - arXiv preprint arXiv …, 2024 - arxiv.org
Vision-and-Language Navigation (VLN) has gained increasing attention over recent years
and many approaches have emerged to advance their development. The remarkable …

Vision-language navigation: a survey and taxonomy

W Wu, T Chang, X Li, Q Yin, Y Hu - Neural Computing and Applications, 2024 - Springer
Vision-language navigation (VLN) tasks require an agent to follow language instructions
from a human guide to navigate in previously unseen environments using visual …

Language-guided navigation via cross-modal grounding and alternate adversarial learning

W Zhang, C Ma, Q Wu, X Yang - IEEE Transactions on Circuits …, 2020 - ieeexplore.ieee.org
The emerging vision-and-language navigation (VLN) problem aims at learning to navigate
an agent to the target location in unseen photo-realistic environments according to the given …

Improving cross-modal alignment in vision language navigation via syntactic information

J Li, H Tan, M Bansal - arXiv preprint arXiv:2104.09580, 2021 - arxiv.org
Vision language navigation is the task that requires an agent to navigate through a 3D
environment based on natural language instructions. One key challenge in this task is to …

Deep learning for embodied vision navigation: A survey

F Zhu, Y Zhu, V Lee, X Liang, X Chang - arXiv preprint arXiv:2108.04097, 2021 - arxiv.org
" Embodied visual navigation" problem requires an agent to navigate in a 3D environment
mainly rely on its first-person observation. This problem has attracted rising attention in …

CLEAR: Improving vision-language navigation with cross-lingual, environment-agnostic representations

J Li, H Tan, M Bansal - arXiv preprint arXiv:2207.02185, 2022 - arxiv.org
Vision-and-Language Navigation (VLN) tasks require an agent to navigate through the
environment based on language instructions. In this paper, we aim to solve two key …

Mlanet: Multi-level attention network with sub-instruction for continuous vision-and-language navigation

Z He, L Wang, S Li, Q Yan, C Liu, Q Chen - arXiv preprint arXiv …, 2023 - arxiv.org
Vision-and-Language Navigation (VLN) aims to develop intelligent agents to navigate in
unseen environments only through language and vision supervision. In the recently …

Vision-Language Navigation with Embodied Intelligence: A Survey

P Gao, P Wang, F Gao, F Wang, R Yuan - arXiv preprint arXiv:2402.14304, 2024 - arxiv.org
As a long-term vision in the field of artificial intelligence, the core goal of embodied
intelligence is to improve the perception, understanding, and interaction capabilities of …