Look before you leap: Unveiling the power of gpt-4v in robotic vision-language planning

Y Hu, F Lin, T Zhang, L Yi, Y Gao - arXiv preprint arXiv:2311.17842, 2023 - arxiv.org
In this study, we are interested in imbuing robots with the capability of physically-grounded
task planning. Recent advancements have shown that large language models (LLMs) …

Drm: Mastering visual reinforcement learning through dormant ratio minimization

G Xu, R Zheng, Y Liang, X Wang, Z Yuan, T Ji… - arXiv preprint arXiv …, 2023 - arxiv.org
Visual reinforcement learning (RL) has shown promise in continuous control tasks. Despite
its progress, current algorithms are still unsatisfactory in virtually every aspect of the …

Learning to manipulate anywhere: A visual generalizable framework for reinforcement learning

Z Yuan, T Wei, S Cheng, G Zhang, Y Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
Can we endow visuomotor robots with generalization capabilities to operate in diverse open-
world scenarios? In this paper, we propose\textbf {Maniwhere}, a generalizable framework …

Focus-Then-Decide: Segmentation-Assisted Reinforcement Learning

C Chen, J Xu, W Liao, H Ding, Z Zhang, Y Yu… - Proceedings of the …, 2024 - ojs.aaai.org
Visual Reinforcement Learning (RL) is a promising approach to achieve human-like
intelligence. However, it currently faces challenges in learning efficiently within noisy …

Investigating Pre-Training Objectives for Generalization in Vision-Based Reinforcement Learning

D Kim, H Lee, K Lee, D Hwang, J Choo - arXiv preprint arXiv:2406.06037, 2024 - arxiv.org
Recently, various pre-training methods have been introduced in vision-based
Reinforcement Learning (RL). However, their generalization ability remains unclear due to …

PEAC: Unsupervised Pre-training for Cross-Embodiment Reinforcement Learning

C Ying, Z Hao, X Zhou, X Xu, H Su, X Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
Designing generalizable agents capable of adapting to diverse embodiments has achieved
significant attention in Reinforcement Learning (RL), which is critical for deploying RL …

Learning the Beneficial, Forgetting the Harmful: High generalization reinforcement learning with in evolving representations

J Zheng, Y Song, G Lin, J Duan, H Lin, S Li - Neurocomputing, 2025 - Elsevier
Abstract In visual Reinforcement Learning (RL), one of the key problems is how to learn
policies, which can be generalized to unseen environments. Recently, saliency guidance …

SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation

J Zhang, C Bai, H He, W Xia, Z Wang, B Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org
Acquiring a multi-task imitation policy in 3D manipulation poses challenges in terms of
scene understanding and action prediction. Current methods employ both 3D representation …

Stem-OB: Generalizable Visual Imitation Learning with Stem-Like Convergent Observation through Diffusion Inversion

K Hu, Z Rui, Y He, Y Liu, P Hua, H Xu - arXiv preprint arXiv:2411.04919, 2024 - arxiv.org
Visual imitation learning methods demonstrate strong performance, yet they lack
generalization when faced with visual input perturbations, including variations in lighting …

Enhancing Visual Generalization in Reinforcement Learning with Cycling Augmentation

S Sun, J Lyu, L Li, J Guo, M Yan, R Liu, X Li - International Conference on …, 2024 - Springer
Effectively generalizing learned policies to unseen environments remains challenging in
Visual Reinforcement Learning (Visual RL). Data Augmentation (DA) is widely used in …