Near-optimal optimistic reinforcement learning using empirical bernstein inequalities

D Zhou, Q Gu, C Szepesvari - Conference on Learning …, 2021 - proceedings.mlr.press

We study reinforcement learning (RL) with linear function approximation where the
underlying transition probability kernel of the Markov decision process (MDP) is a linear …

被引用次数：244 相关文章所有 7 个版本

[PDF] mlr.press

Learning near optimal policies with low inherent bellman error

A Zanette, A Lazaric, M Kochenderfer… - International …, 2020 - proceedings.mlr.press

We study the exploration problem with approximate linear action-value functions in episodic
reinforcement learning under the notion of low inherent Bellman error, a condition normally …

被引用次数：256 相关文章所有 5 个版本

[PDF] mlr.press

Nearly minimax optimal reinforcement learning with linear function approximation

P Hu, Y Chen, L Huang - International Conference on …, 2022 - proceedings.mlr.press

We study reinforcement learning with linear function approximation where the transition
probability and reward functions are linear with respect to a feature mapping $\boldsymbol …

被引用次数：37 相关文章所有 4 个版本

[PDF] neurips.cc

Tactical optimism and pessimism for deep reinforcement learning

T Moskovitz, J Parker-Holder… - Advances in …, 2021 - proceedings.neurips.cc

In recent years, deep off-policy actor-critic algorithms have become a dominant approach to
reinforcement learning for continuous control. One of the primary drivers of this improved …

被引用次数：59 相关文章所有 15 个版本

[PDF] mlr.press

Langevin thompson sampling with logarithmic communication: bandits and reinforcement learning

A Karbasi, NL Kuang, Y Ma… - … Conference on Machine …, 2023 - proceedings.mlr.press

Thompson sampling (TS) is widely used in sequential decision making due to its ease of use
and appealing empirical performance. However, many existing analytical and empirical …

被引用次数：7 相关文章所有 7 个版本

[PDF] neurips.cc

Learning adversarial low-rank markov decision processes with unknown transition and full-information feedback

C Zhao, R Yang, B Wang… - Advances in Neural …, 2024 - proceedings.neurips.cc

In this work, we study the low-rank MDPs with adversarially changed losses in the full-
information feedback setting. In particular, the unknown transition probability kernel admits a …

被引用次数：4 相关文章所有 6 个版本

[PDF] neurips.cc

Improved Bayesian regret bounds for Thompson sampling in reinforcement learning

A Moradipari, M Pedramfar… - Advances in …, 2023 - proceedings.neurips.cc

In this paper, we prove state-of-the-art Bayesian regret bounds for Thompson Sampling in
reinforcement learning in a multitude of settings. We present a refined analysis of the …

被引用次数：4 相关文章所有 6 个版本

[PDF] openreview.net

Learning adversarial linear mixture markov decision processes with bandit feedback and unknown transition

C Zhao, R Yang, B Wang, S Li - The Eleventh International …, 2023 - openreview.net

We study reinforcement learning (RL) with linear function approximation, unknown
transition, and adversarial losses in the bandit feedback setting. Specifically, the unknown …

被引用次数：11 相关文章

[PDF] neurips.cc

Reinforcement learning in reward-mixing mdps

J Kwon, Y Efroni, C Caramanis… - Advances in Neural …, 2021 - proceedings.neurips.cc

Learning a near optimal policy in a partially observable system remains an elusive
challenge in contemporary reinforcement learning. In this work, we consider episodic …

被引用次数：21 相关文章所有 7 个版本

[PDF] mlr.press

The regret of exploration and the control of bad episodes in reinforcement learning

V Boone, B Gaujal - International Conference on Machine …, 2023 - proceedings.mlr.press

The first contribution of this paper is the introduction of a new performance measure of a RL
algorithm that is more discriminating than the regret, that we call the regret of exploration that …

被引用次数：2 相关文章所有 12 个版本

高级搜索

QQ 群