We study the exploration problem with approximate linear action-value functions in episodic reinforcement learning under the notion of low inherent Bellman error, a condition normally …
P Hu, Y Chen, L Huang - International Conference on …, 2022 - proceedings.mlr.press
We study reinforcement learning with linear function approximation where the transition probability and reward functions are linear with respect to a feature mapping $\boldsymbol …
In recent years, deep off-policy actor-critic algorithms have become a dominant approach to reinforcement learning for continuous control. One of the primary drivers of this improved …
A Karbasi, NL Kuang, Y Ma… - … Conference on Machine …, 2023 - proceedings.mlr.press
Thompson sampling (TS) is widely used in sequential decision making due to its ease of use and appealing empirical performance. However, many existing analytical and empirical …
C Zhao, R Yang, B Wang… - Advances in Neural …, 2024 - proceedings.neurips.cc
In this work, we study the low-rank MDPs with adversarially changed losses in the full- information feedback setting. In particular, the unknown transition probability kernel admits a …
In this paper, we prove state-of-the-art Bayesian regret bounds for Thompson Sampling in reinforcement learning in a multitude of settings. We present a refined analysis of the …
We study reinforcement learning (RL) with linear function approximation, unknown transition, and adversarial losses in the bandit feedback setting. Specifically, the unknown …
Learning a near optimal policy in a partially observable system remains an elusive challenge in contemporary reinforcement learning. In this work, we consider episodic …
V Boone, B Gaujal - International Conference on Machine …, 2023 - proceedings.mlr.press
The first contribution of this paper is the introduction of a new performance measure of a RL algorithm that is more discriminating than the regret, that we call the regret of exploration that …