Planning in markov decision processes with gap-dependent sample complexity

L Lin, L Pan, S Liu - Computers in Industry, 2022 - Elsevier

The burgeoning development of the cloud market has promoted the expansion of resources
held by cloud providers, but the resulting underutilization caused by the over-provisioned …

被引用次数：15 相关文章所有 2 个版本

[PDF] mlr.press

Fast active learning for pure exploration in reinforcement learning

P Ménard, OD Domingues, A Jonsson… - International …, 2021 - proceedings.mlr.press

Realistic environments often provide agents with very limited feedback. When the
environment is initially unknown, the feedback, in the beginning, can be completely absent …

被引用次数：82 相关文章所有 7 个版本

[PDF] neurips.cc

Instance-dependent near-optimal policy identification in linear mdps via online experiment design

A Wagenmaker, KG Jamieson - Advances in Neural …, 2022 - proceedings.neurips.cc

While much progress has been made in understanding the minimax sample complexity of
reinforcement learning (RL)---the complexity of learning on the worst-case''instance---such …

被引用次数：29 相关文章所有 6 个版本

[PDF] mlr.press

Adaptive reward-free exploration

E Kaufmann, P Ménard… - Algorithmic …, 2021 - proceedings.mlr.press

Reward-free exploration is a reinforcement learning setting recently studied by (Jin et al.
2020), who address it by running several algorithms with regret guarantees in parallel. In our …

被引用次数：90 相关文章所有 9 个版本

[PDF] neurips.cc

Policy finetuning in reinforcement learning via design of experiments using offline data

R Zhang, A Zanette - Advances in Neural Information …, 2024 - proceedings.neurips.cc

In some applications of reinforcement learning, a dataset of pre-collected experience is
already availablebut it is also possible to acquire some additional online data to help …

被引用次数：4 相关文章所有 6 个版本

[PDF] jmlr.org

Mixture martingales revisited with applications to sequential tests and confidence intervals

E Kaufmann, WM Koolen - Journal of Machine Learning Research, 2021 - jmlr.org

This paper presents new deviation inequalities that are valid uniformly in time under
adaptive sampling in a multi-armed bandit model. The deviations are measured using the …

被引用次数：122 相关文章所有 12 个版本

[PDF] mlr.press

Towards theoretical understanding of inverse reinforcement learning

AM Metelli, F Lazzati, M Restelli - … Conference on Machine …, 2023 - proceedings.mlr.press

Inverse reinforcement learning (IRL) denotes a powerful family of algorithms for recovering a
reward function justifying the behavior demonstrated by an expert agent. A well-known …

被引用次数：13 相关文章所有 10 个版本

[PDF] mlr.press

Beyond no regret: Instance-dependent pac reinforcement learning

AJ Wagenmaker, M Simchowitz… - … on Learning Theory, 2022 - proceedings.mlr.press

The theory of reinforcement learning has focused on two fundamental problems: achieving
low regret, and identifying $\epsilon $-optimal policies. While a simple reduction allows one …

被引用次数：38 相关文章所有 4 个版本

[PDF] mlr.press

Fast rates for maximum entropy exploration

D Tiapkin, D Belomestny… - International …, 2023 - proceedings.mlr.press

We address the challenge of exploration in reinforcement learning (RL) when the agent
operates in an unknown environment with sparse or no rewards. In this work, we study the …

被引用次数：9 相关文章所有 9 个版本

[PDF] neurips.cc

Optimistic posterior sampling for reinforcement learning with few samples and tight guarantees

D Tiapkin, D Belomestny… - Advances in …, 2022 - proceedings.neurips.cc

We consider reinforcement learning in an environment modeled by an episodic, tabular,
step-dependent Markov decision process of horizon $ H $ with $ S $ states, and $ A …

被引用次数：9 相关文章所有 10 个版本

高级搜索

QQ 群