Manigaussian: Dynamic gaussian splatting for multi-task robotic manipulation

G Lu, S Zhang, Z Wang, C Liu, J Lu, Y Tang - European Conference on …, 2025 - Springer
Performing language-conditioned robotic manipulation tasks in unstructured environments
is highly demanded for general intelligent robots. Conventional robotic manipulation …

Vision-and-language navigation today and tomorrow: A survey in the era of foundation models

Y Zhang, Z Ma, J Li, Y Qiao, Z Wang, J Chai… - arXiv preprint arXiv …, 2024 - arxiv.org
Vision-and-Language Navigation (VLN) has gained increasing attention over recent years
and many approaches have emerged to advance their development. The remarkable …

Ponder & Press: Advancing Visual GUI Agent towards General Computer Control

Y Wang, H Zhang, J Tian, Y Tang - arXiv preprint arXiv:2412.01268, 2024 - arxiv.org
Most existing GUI agents typically depend on non-vision inputs like HTML source code or
accessibility trees, limiting their flexibility across diverse software environments and …

Embodied Instruction Following in Unknown Environments

Z Wu, Z Wang, X Xu, J Lu, H Yan - arXiv preprint arXiv:2406.11818, 2024 - arxiv.org
Enabling embodied agents to complete complex human instructions from natural language
is crucial to autonomous systems in household services. Conventional methods can only …

Socratic Planner: Inquiry-Based Zero-Shot Planning for Embodied Instruction Following

S Shin, J Kim, GC Kang, BT Zhang - arXiv preprint arXiv:2404.15190, 2024 - arxiv.org
Embodied Instruction Following (EIF) is the task of executing natural language instructions
by navigating and interacting with objects in 3D environments. One of the primary …

R2C: Mapping Room to Chessboard to Unlock LLM As Low-Level Action Planner

Z Bai, H Li, B Fu, C Xiong, R Wang, X Chen - openreview.net
This paper explores the potential of leveraging large language models (LLMs) as low-level
action planners capable of executing long-horizon tasks based on natural language …