Survey on large language model-enhanced reinforcement learning: Concept, taxonomy, and methods

Y Cao, H Zhao, Y Cheng, T Shu, Y Chen… - … on Neural Networks …, 2024 - ieeexplore.ieee.org
With extensive pretrained knowledge and high-level general capabilities, large language
models (LLMs) emerge as a promising avenue to augment reinforcement learning (RL) in …

Voxposer: Composable 3d value maps for robotic manipulation with language models

W Huang, C Wang, R Zhang, Y Li, J Wu… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) are shown to possess a wealth of actionable knowledge that
can be extracted for robot manipulation in the form of reasoning and planning. Despite the …

Spatialvlm: Endowing vision-language models with spatial reasoning capabilities

B Chen, Z Xu, S Kirmani, B Ichter… - Proceedings of the …, 2024 - openaccess.thecvf.com
Understanding and reasoning about spatial relationships is crucial for Visual Question
Answering (VQA) and robotics. Vision Language Models (VLMs) have shown impressive …

A survey of optimization-based task and motion planning: From classical to learning approaches

Z Zhao, S Cheng, Y Ding, Z Zhou… - IEEE/ASME …, 2024 - ieeexplore.ieee.org
Task and motion planning (TAMP) integrates high-level task planning and low-level motion
planning to equip robots with the autonomy to effectively reason over long-horizon, dynamic …

AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents

J Cui, T Liu, N Liu, Y Yang, Y Zhu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Traditional approaches in physics-based motion generation centered around imitation
learning and reward shaping often struggle to adapt to new scenarios. To tackle this …

Vision-language models as a source of rewards

K Baumli, S Baveja, F Behbahani, H Chan… - arXiv preprint arXiv …, 2023 - arxiv.org
Building generalist agents that can accomplish many goals in rich open-ended
environments is one of the research frontiers for reinforcement learning. A key limiting factor …

Rl-vlm-f: Reinforcement learning from vision language foundation model feedback

Y Wang, Z Sun, J Zhang, Z Xian, E Biyik, D Held… - arXiv preprint arXiv …, 2024 - arxiv.org
Reward engineering has long been a challenge in Reinforcement Learning (RL) research,
as it often requires extensive human effort and iterative processes of trial-and-error to design …

Generative ai for self-adaptive systems: State of the art and research roadmap

J Li, M Zhang, N Li, D Weyns, Z Jin, K Tei - ACM Transactions on …, 2024 - dl.acm.org
Self-adaptive systems (SASs) are designed to handle changes and uncertainties through a
feedback loop with four core functionalities: monitoring, analyzing, planning, and execution …

FreeMotion: MoCap-Free Human Motion Synthesis with Multimodal Large Language Models

Z Zhang, Y Li, H Huang, M Lin, L Yi - European Conference on Computer …, 2025 - Springer
Human motion synthesis is a fundamental task in computer animation. Despite recent
progress in this field utilizing deep learning and motion capture data, existing methods are …

Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning

Y Zhai, H Bai, Z Lin, J Pan, S Tong, Y Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org
Large vision-language models (VLMs) fine-tuned on specialized visual instruction-following
data have exhibited impressive language reasoning capabilities across various scenarios …