A survey of imitation learning: Algorithms, recent developments, and challenges

M Zare, PM Kebria, A Khosravi… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
In recent years, the development of robotics and artificial intelligence (AI) systems has been
nothing short of remarkable. As these systems continue to evolve, they are being utilized in …

Understanding reinforcement learning-based fine-tuning of diffusion models: A tutorial and review

M Uehara, Y Zhao, T Biancalani, S Levine - arXiv preprint arXiv …, 2024 - arxiv.org
This tutorial provides a comprehensive survey of methods for fine-tuning diffusion models to
optimize downstream reward functions. While diffusion models are widely known to provide …

Pessimistic bootstrapping for uncertainty-driven offline reinforcement learning

C Bai, L Wang, Z Yang, Z Deng, A Garg, P Liu… - arXiv preprint arXiv …, 2022 - arxiv.org
Offline Reinforcement Learning (RL) aims to learn policies from previously collected
datasets without exploring the environment. Directly applying off-policy algorithms to offline …

Leveraging offline data in online reinforcement learning

A Wagenmaker, A Pacchiano - International Conference on …, 2023 - proceedings.mlr.press
Two central paradigms have emerged in the reinforcement learning (RL) community: online
RL and offline RL. In the online RL setting, the agent has no prior knowledge of the …

Towards instance-optimal offline reinforcement learning with pessimism

M Yin, YX Wang - Advances in neural information …, 2021 - proceedings.neurips.cc
We study the\emph {offline reinforcement learning}(offline RL) problem, where the goal is to
learn a reward-maximizing policy in an unknown\emph {Markov Decision Process}(MDP) …

Hybrid rl: Using both offline and online data can make rl efficient

Y Song, Y Zhou, A Sekhari, JA Bagnell… - arXiv preprint arXiv …, 2022 - arxiv.org
We consider a hybrid reinforcement learning setting (Hybrid RL), in which an agent has
access to an offline dataset and the ability to collect experience via real-world online …

Discriminator-weighted offline imitation learning from suboptimal demonstrations

H Xu, X Zhan, H Yin, H Qin - International Conference on …, 2022 - proceedings.mlr.press
We study the problem of offline Imitation Learning (IL) where an agent aims to learn an
optimal expert behavior policy without additional online environment interactions. Instead …

Ceil: Generalized contextual imitation learning

J Liu, L He, Y Kang, Z Zhuang… - Advances in Neural …, 2023 - proceedings.neurips.cc
In this paper, we present ContExtual Imitation Learning (CEIL), a general and broadly
applicable algorithm for imitation learning (IL). Inspired by the formulation of hindsight …

Provable guarantees for generative behavior cloning: Bridging low-level stability and high-level behavior

A Block, A Jadbabaie, D Pfrommer… - Advances in …, 2024 - proceedings.neurips.cc
We propose a theoretical framework for studying behavior cloning of complex expert
demonstrations using generative modeling. Our framework invokes low-level controllers …

The efficacy of pessimism in asynchronous Q-learning

Y Yan, G Li, Y Chen, J Fan - IEEE Transactions on Information …, 2023 - ieeexplore.ieee.org
This paper is concerned with the asynchronous form of Q-learning, which applies a
stochastic approximation scheme to Markovian data samples. Motivated by the recent …