Mitigating covariate shift in imitation learning via offline data with partial coverage

M Zare, PM Kebria, A Khosravi… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

In recent years, the development of robotics and artificial intelligence (AI) systems has been
nothing short of remarkable. As these systems continue to evolve, they are being utilized in …

被引用次数：45 相关文章所有 2 个版本

[PDF] arxiv.org

Understanding reinforcement learning-based fine-tuning of diffusion models: A tutorial and review

M Uehara, Y Zhao, T Biancalani, S Levine - arXiv preprint arXiv …, 2024 - arxiv.org

This tutorial provides a comprehensive survey of methods for fine-tuning diffusion models to
optimize downstream reward functions. While diffusion models are widely known to provide …

被引用次数：9 相关文章所有 3 个版本

[PDF] arxiv.org

Pessimistic bootstrapping for uncertainty-driven offline reinforcement learning

C Bai, L Wang, Z Yang, Z Deng, A Garg, P Liu… - arXiv preprint arXiv …, 2022 - arxiv.org

Offline Reinforcement Learning (RL) aims to learn policies from previously collected
datasets without exploring the environment. Directly applying off-policy algorithms to offline …

被引用次数：149 相关文章所有 5 个版本

[PDF] mlr.press

Leveraging offline data in online reinforcement learning

A Wagenmaker, A Pacchiano - International Conference on …, 2023 - proceedings.mlr.press

Two central paradigms have emerged in the reinforcement learning (RL) community: online
RL and offline RL. In the online RL setting, the agent has no prior knowledge of the …

被引用次数：45 相关文章所有 6 个版本

[PDF] neurips.cc

Towards instance-optimal offline reinforcement learning with pessimism

M Yin, YX Wang - Advances in neural information …, 2021 - proceedings.neurips.cc

We study the\emph {offline reinforcement learning}(offline RL) problem, where the goal is to
learn a reward-maximizing policy in an unknown\emph {Markov Decision Process}(MDP) …

被引用次数：90 相关文章所有 7 个版本

[PDF] arxiv.org

Hybrid rl: Using both offline and online data can make rl efficient

Y Song, Y Zhou, A Sekhari, JA Bagnell… - arXiv preprint arXiv …, 2022 - arxiv.org

We consider a hybrid reinforcement learning setting (Hybrid RL), in which an agent has
access to an offline dataset and the ability to collect experience via real-world online …

被引用次数：79 相关文章所有 5 个版本

[PDF] mlr.press

Discriminator-weighted offline imitation learning from suboptimal demonstrations

H Xu, X Zhan, H Yin, H Qin - International Conference on …, 2022 - proceedings.mlr.press

We study the problem of offline Imitation Learning (IL) where an agent aims to learn an
optimal expert behavior policy without additional online environment interactions. Instead …

被引用次数：75 相关文章所有 10 个版本

[PDF] neurips.cc

Ceil: Generalized contextual imitation learning

J Liu, L He, Y Kang, Z Zhuang… - Advances in Neural …, 2023 - proceedings.neurips.cc

In this paper, we present ContExtual Imitation Learning (CEIL), a general and broadly
applicable algorithm for imitation learning (IL). Inspired by the formulation of hindsight …

被引用次数：15 相关文章所有 5 个版本

[PDF] neurips.cc

Provable guarantees for generative behavior cloning: Bridging low-level stability and high-level behavior

A Block, A Jadbabaie, D Pfrommer… - Advances in …, 2024 - proceedings.neurips.cc

We propose a theoretical framework for studying behavior cloning of complex expert
demonstrations using generative modeling. Our framework invokes low-level controllers …

被引用次数：14 相关文章所有 3 个版本

[PDF] ieee.org

The efficacy of pessimism in asynchronous Q-learning

Y Yan, G Li, Y Chen, J Fan - IEEE Transactions on Information …, 2023 - ieeexplore.ieee.org

This paper is concerned with the asynchronous form of Q-learning, which applies a
stochastic approximation scheme to Markovian data samples. Motivated by the recent …

被引用次数：57 相关文章所有 8 个版本

高级搜索

QQ 群