Nearly minimax optimal reinforcement learning for linear mixture markov decision processes

D Zhou, Q Gu, C Szepesvari - Conference on Learning …, 2021 - proceedings.mlr.press
We study reinforcement learning (RL) with linear function approximation where the
underlying transition probability kernel of the Markov decision process (MDP) is a linear …

Learning near optimal policies with low inherent bellman error

A Zanette, A Lazaric, M Kochenderfer… - International …, 2020 - proceedings.mlr.press
We study the exploration problem with approximate linear action-value functions in episodic
reinforcement learning under the notion of low inherent Bellman error, a condition normally …

Nearly minimax optimal reinforcement learning with linear function approximation

P Hu, Y Chen, L Huang - International Conference on …, 2022 - proceedings.mlr.press
We study reinforcement learning with linear function approximation where the transition
probability and reward functions are linear with respect to a feature mapping $\boldsymbol …

Tactical optimism and pessimism for deep reinforcement learning

T Moskovitz, J Parker-Holder… - Advances in …, 2021 - proceedings.neurips.cc
In recent years, deep off-policy actor-critic algorithms have become a dominant approach to
reinforcement learning for continuous control. One of the primary drivers of this improved …

Langevin thompson sampling with logarithmic communication: bandits and reinforcement learning

A Karbasi, NL Kuang, Y Ma… - … Conference on Machine …, 2023 - proceedings.mlr.press
Thompson sampling (TS) is widely used in sequential decision making due to its ease of use
and appealing empirical performance. However, many existing analytical and empirical …

Learning adversarial low-rank markov decision processes with unknown transition and full-information feedback

C Zhao, R Yang, B Wang… - Advances in Neural …, 2024 - proceedings.neurips.cc
In this work, we study the low-rank MDPs with adversarially changed losses in the full-
information feedback setting. In particular, the unknown transition probability kernel admits a …

Improved Bayesian regret bounds for Thompson sampling in reinforcement learning

A Moradipari, M Pedramfar… - Advances in …, 2023 - proceedings.neurips.cc
In this paper, we prove state-of-the-art Bayesian regret bounds for Thompson Sampling in
reinforcement learning in a multitude of settings. We present a refined analysis of the …

Learning adversarial linear mixture markov decision processes with bandit feedback and unknown transition

C Zhao, R Yang, B Wang, S Li - The Eleventh International …, 2023 - openreview.net
We study reinforcement learning (RL) with linear function approximation, unknown
transition, and adversarial losses in the bandit feedback setting. Specifically, the unknown …

Reinforcement learning in reward-mixing mdps

J Kwon, Y Efroni, C Caramanis… - Advances in Neural …, 2021 - proceedings.neurips.cc
Learning a near optimal policy in a partially observable system remains an elusive
challenge in contemporary reinforcement learning. In this work, we consider episodic …

The regret of exploration and the control of bad episodes in reinforcement learning

V Boone, B Gaujal - International Conference on Machine …, 2023 - proceedings.mlr.press
The first contribution of this paper is the introduction of a new performance measure of a RL
algorithm that is more discriminating than the regret, that we call the regret of exploration that …