Thinkbot: Embodied instruction following with thought chain reasoning

G Lu, S Zhang, Z Wang, C Liu, J Lu, Y Tang - European Conference on …, 2025 - Springer

Performing language-conditioned robotic manipulation tasks in unstructured environments
is highly demanded for general intelligent robots. Conventional robotic manipulation …

被引用次数：25 相关文章所有 2 个版本

[PDF] arxiv.org

Vision-and-language navigation today and tomorrow: A survey in the era of foundation models

Y Zhang, Z Ma, J Li, Y Qiao, Z Wang, J Chai… - arXiv preprint arXiv …, 2024 - arxiv.org

Vision-and-Language Navigation (VLN) has gained increasing attention over recent years
and many approaches have emerged to advance their development. The remarkable …

被引用次数：12 相关文章所有 4 个版本

[PDF] arxiv.org

Ponder & Press: Advancing Visual GUI Agent towards General Computer Control

Y Wang, H Zhang, J Tian, Y Tang - arXiv preprint arXiv:2412.01268, 2024 - arxiv.org

Most existing GUI agents typically depend on non-vision inputs like HTML source code or
accessibility trees, limiting their flexibility across diverse software environments and …

Embodied Instruction Following in Unknown Environments

Z Wu, Z Wang, X Xu, J Lu, H Yan - arXiv preprint arXiv:2406.11818, 2024 - arxiv.org

Enabling embodied agents to complete complex human instructions from natural language
is crucial to autonomous systems in household services. Conventional methods can only …

被引用次数：3 相关文章

[PDF] arxiv.org

Socratic Planner: Inquiry-Based Zero-Shot Planning for Embodied Instruction Following

S Shin, J Kim, GC Kang, BT Zhang - arXiv preprint arXiv:2404.15190, 2024 - arxiv.org

Embodied Instruction Following (EIF) is the task of executing natural language instructions
by navigating and interacting with objects in 3D environments. One of the primary …

被引用次数：1 相关文章所有 2 个版本

[PDF] openreview.net

R2C: Mapping Room to Chessboard to Unlock LLM As Low-Level Action Planner

Z Bai, H Li, B Fu, C Xiong, R Wang, X Chen - openreview.net

This paper explores the potential of leveraging large language models (LLMs) as low-level
action planners capable of executing long-horizon tasks based on natural language …

高级搜索

QQ 群