Vision-and-Language Navigation (VLN) has gained increasing attention over recent years and many approaches have emerged to advance their development. The remarkable …
Y Wang, H Zhang, J Tian, Y Tang - arXiv preprint arXiv:2412.01268, 2024 - arxiv.org
Most existing GUI agents typically depend on non-vision inputs like HTML source code or accessibility trees, limiting their flexibility across diverse software environments and …
Enabling embodied agents to complete complex human instructions from natural language is crucial to autonomous systems in household services. Conventional methods can only …
Embodied Instruction Following (EIF) is the task of executing natural language instructions by navigating and interacting with objects in 3D environments. One of the primary …
Z Bai, H Li, B Fu, C Xiong, R Wang, X Chen - openreview.net
This paper explores the potential of leveraging large language models (LLMs) as low-level action planners capable of executing long-horizon tasks based on natural language …