Diffusion model is an effective planner and data synthesizer for multi-task reinforcement learning

H He, C Bai, K Xu, Z Yang, W Zhang… - Advances in neural …, 2023 - proceedings.neurips.cc
Diffusion models have demonstrated highly-expressive generative capabilities in vision and
NLP. Recent studies in reinforcement learning (RL) have shown that diffusion models are …

Openchat: Advancing open-source language models with mixed-quality data

G Wang, S Cheng, X Zhan, X Li, S Song… - arXiv preprint arXiv …, 2023 - arxiv.org
Nowadays, open-source large language models like LLaMA have emerged. Recent
developments have incorporated supervised fine-tuning (SFT) and reinforcement learning …

Advancements in humanoid robots: A comprehensive review and future prospects

Y Tong, H Liu, Z Zhang - IEEE/CAA Journal of Automatica …, 2024 - ieeexplore.ieee.org
This paper provides a comprehensive review of the current status, advancements, and future
prospects of humanoid robots, highlighting their significance in driving the evolution of next …

A policy-guided imitation approach for offline reinforcement learning

H Xu, L Jiang, L Jianxiong… - Advances in Neural …, 2022 - proceedings.neurips.cc
Offline reinforcement learning (RL) methods can generally be categorized into two types: RL-
based and Imitation-based. RL-based methods could in principle enjoy out-of-distribution …

Ceil: Generalized contextual imitation learning

J Liu, L He, Y Kang, Z Zhuang… - Advances in Neural …, 2023 - proceedings.neurips.cc
In this paper, we present ContExtual Imitation Learning (CEIL), a general and broadly
applicable algorithm for imitation learning (IL). Inspired by the formulation of hindsight …

Mimicgen: A data generation system for scalable robot learning using human demonstrations

A Mandlekar, S Nasiriany, B Wen, I Akinola… - arXiv preprint arXiv …, 2023 - arxiv.org
Imitation learning from a large set of human demonstrations has proved to be an effective
paradigm for building capable robot agents. However, the demonstrations can be extremely …

Beyond uniform sampling: Offline reinforcement learning with imbalanced datasets

ZW Hong, A Kumar, S Karnik… - Advances in …, 2023 - proceedings.neurips.cc
Offline reinforcement learning (RL) enables learning a decision-making policy without
interaction with the environment. This makes it particularly beneficial in situations where …

Robot learning on the job: Human-in-the-loop autonomy and learning during deployment

H Liu, S Nasiriany, L Zhang, Z Bao… - … International Journal of …, 2022 - journals.sagepub.com
With the rapid growth of computing powers and recent advances in deep learning, we have
witnessed impressive demonstrations of novel robot capabilities in research settings …

Imitation learning from imperfection: Theoretical justifications and algorithms

Z Li, T Xu, Z Qin, Y Yu, ZQ Luo - Advances in Neural …, 2024 - proceedings.neurips.cc
Imitation learning (IL) algorithms excel in acquiring high-quality policies from expert data for
sequential decision-making tasks. But, their effectiveness is hampered when faced with …

Policy expansion for bridging offline-to-online reinforcement learning

H Zhang, W Xu, H Yu - arXiv preprint arXiv:2302.00935, 2023 - arxiv.org
Pre-training with offline data and online fine-tuning using reinforcement learning is a
promising strategy for learning control policies by leveraging the best of both worlds in terms …