Nearly minimax optimal reinforcement learning for linear mixture markov decision processes

D Zhou, Q Gu, C Szepesvari - Conference on Learning …, 2021 - proceedings.mlr.press
We study reinforcement learning (RL) with linear function approximation where the
underlying transition probability kernel of the Markov decision process (MDP) is a linear …

Nearly minimax optimal reinforcement learning for linear markov decision processes

J He, H Zhao, D Zhou, Q Gu - International Conference on …, 2023 - proceedings.mlr.press
We study reinforcement learning (RL) with linear function approximation. For episodic time-
inhomogeneous linear Markov decision processes (linear MDPs) whose transition …

Learning near optimal policies with low inherent bellman error

A Zanette, A Lazaric, M Kochenderfer… - International …, 2020 - proceedings.mlr.press
We study the exploration problem with approximate linear action-value functions in episodic
reinforcement learning under the notion of low inherent Bellman error, a condition normally …

Almost optimal model-free reinforcement learningvia reference-advantage decomposition

Z Zhang, Y Zhou, X Ji - Advances in Neural Information …, 2020 - proceedings.neurips.cc
We study the reinforcement learning problem in the setting of finite-horizon1episodic Markov
Decision Processes (MDPs) with S states, A actions, and episode length H. We propose a …

Is reinforcement learning more difficult than bandits? a near-optimal algorithm escaping the curse of horizon

Z Zhang, X Ji, S Du - Conference on Learning Theory, 2021 - proceedings.mlr.press
Episodic reinforcement learning and contextual bandits are two widely studied sequential
decision-making problems. Episodic reinforcement learning generalizes contextual bandits …

Optimistic posterior sampling for reinforcement learning: worst-case regret bounds

S Agrawal, R Jia - Advances in Neural Information …, 2017 - proceedings.neurips.cc
We present an algorithm based on posterior sampling (aka Thompson sampling) that
achieves near-optimal worst-case regret bounds when the underlying Markov Decision …

VOL: Towards Optimal Regret in Model-free RL with Nonlinear Function Approximation

A Agarwal, Y Jin, T Zhang - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
We study time-inhomogeneous episodic reinforcement learning (RL) under general function
approximation and sparse rewards. We design a new algorithm, Variance-weighted …

Learning adversarial markov decision processes with bandit feedback and unknown transition

C Jin, T Jin, H Luo, S Sra, T Yu - International Conference on …, 2020 - proceedings.mlr.press
We consider the task of learning in episodic finite-horizon Markov decision processes with
an unknown transition function, bandit feedback, and adversarial losses. We propose an …

Understanding domain randomization for sim-to-real transfer

X Chen, J Hu, C Jin, L Li, L Wang - arXiv preprint arXiv:2110.03239, 2021 - arxiv.org
Reinforcement learning encounters many challenges when applied directly in the real world.
Sim-to-real transfer is widely used to transfer the knowledge learned from simulation to the …

Dueling rl: Reinforcement learning with trajectory preferences

A Saha, A Pacchiano, J Lee - International Conference on …, 2023 - proceedings.mlr.press
We consider the problem of preference-based reinforcement learning (PbRL), where, unlike
traditional reinforcement learning (RL), an agent receives feedback only in terms of 1 bit …