Realistic environments often provide agents with very limited feedback. When the environment is initially unknown, the feedback, in the beginning, can be completely absent …
While much progress has been made in understanding the minimax sample complexity of reinforcement learning (RL)---the complexity of learning on the worst-case''instance---such …
Reward-free exploration is a reinforcement learning setting recently studied by (Jin et al. 2020), who address it by running several algorithms with regret guarantees in parallel. In our …
R Zhang, A Zanette - Advances in Neural Information …, 2024 - proceedings.neurips.cc
In some applications of reinforcement learning, a dataset of pre-collected experience is already availablebut it is also possible to acquire some additional online data to help …
E Kaufmann, WM Koolen - Journal of Machine Learning Research, 2021 - jmlr.org
This paper presents new deviation inequalities that are valid uniformly in time under adaptive sampling in a multi-armed bandit model. The deviations are measured using the …
Inverse reinforcement learning (IRL) denotes a powerful family of algorithms for recovering a reward function justifying the behavior demonstrated by an expert agent. A well-known …
The theory of reinforcement learning has focused on two fundamental problems: achieving low regret, and identifying $\epsilon $-optimal policies. While a simple reduction allows one …
We address the challenge of exploration in reinforcement learning (RL) when the agent operates in an unknown environment with sparse or no rewards. In this work, we study the …
We consider reinforcement learning in an environment modeled by an episodic, tabular, step-dependent Markov decision process of horizon $ H $ with $ S $ states, and $ A …