Conservative q-learning for offline reinforcement learning

A Kumar, A Zhou, G Tucker… - Advances in Neural …, 2020 - proceedings.neurips.cc
Effectively leveraging large, previously collected datasets in reinforcement learn-ing (RL) is
a key challenge for large-scale real-world applications. Offline RL algorithms promise to …

Social nce: Contrastive learning of socially-aware motion representations

Y Liu, Q Yan, A Alahi - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
Learning socially-aware motion representations is at the core of recent advances in multi-
agent problems, such as human motion forecasting and robot navigation in crowds. Despite …

Disagreement-regularized imitation learning

K Brantley, W Sun, M Henaff - International Conference on Learning …, 2019 - openreview.net
We present a simple and effective algorithm designed to address the covariate shift problem
in imitation learning. It operates by training an ensemble of policies on the expert …

Toward the fundamental limits of imitation learning

N Rajaraman, L Yang, J Jiao… - Advances in Neural …, 2020 - proceedings.neurips.cc
Imitation learning (IL) aims to mimic the behavior of an expert policy in a sequential decision-
making problem given only demonstrations. In this paper, we focus on understanding the …

On the value of interaction and function approximation in imitation learning

N Rajaraman, Y Han, L Yang, J Liu… - Advances in …, 2021 - proceedings.neurips.cc
We study the statistical guarantees for the Imitation Learning (IL) problem in episodic MDPs.
Rajaraman et al.(2020) show an information theoretic lower bound that in the worst case, a …

Robust imitation of a few demonstrations with a backwards model

JY Park, L Wong - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Behavior cloning of expert demonstrations can speed up learning optimal policies in a more
sample-efficient way over reinforcement learning. However, the policy cannot extrapolate …

[PDF][PDF] Robust Multi-Domain Multi-Turn Dialogue Policy via Student-Teacher Offline Reinforcement Learning

M Rohmatillah, JT Chien - APSIPA Transactions on Signal …, 2024 - nowpublishers.com
Dialogue policy plays a crucial role in a dialogue system as it determines the system
response given a user input. In a pipeline system, the dialogue policy is susceptible to the …

Doubly constrained offline reinforcement learning for learning path recommendation

Y Yun, H Dai, R An, Y Zhang, X Shang - Knowledge-Based Systems, 2024 - Elsevier
Learning path recommendation refers to the application of interactive recommendation
systems in the field of education, aimed at optimizing learning outcomes while minimizing …

Provably breaking the quadratic error compounding barrier in imitation learning, optimally

N Rajaraman, Y Han, LF Yang, K Ramchandran… - arXiv preprint arXiv …, 2021 - arxiv.org
We study the statistical limits of Imitation Learning (IL) in episodic Markov Decision
Processes (MDPs) with a state space $\mathcal {S} $. We focus on the known-transition …

HIPODE: Enhancing Offline Reinforcement Learning with High-Quality Synthetic Data from a Policy-Decoupled Approach

S Lian, Y Ma, J Liu, Y Zheng, Z Meng - arXiv preprint arXiv:2306.06329, 2023 - arxiv.org
Offline reinforcement learning (ORL) has gained attention as a means of training
reinforcement learning models using pre-collected static data. To address the issue of …