Byol-explore: Exploration by bootstrapped prediction

Z Guo, S Thakoor, M Pîslar… - Advances in neural …, 2022 - proceedings.neurips.cc
We present BYOL-Explore, a conceptually simple yet general approach for curiosity-driven
exploration in visually complex environments. BYOL-Explore learns the world …

Reward-free rl is no harder than reward-aware rl in linear markov decision processes

AJ Wagenmaker, Y Chen… - International …, 2022 - proceedings.mlr.press
Reward-free reinforcement learning (RL) considers the setting where the agent does not
have access to a reward function during exploration, but must propose a near-optimal policy …

VOL: Towards Optimal Regret in Model-free RL with Nonlinear Function Approximation

A Agarwal, Y Jin, T Zhang - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
We study time-inhomogeneous episodic reinforcement learning (RL) under general function
approximation and sparse rewards. We design a new algorithm, Variance-weighted …

Model-free representation learning and exploration in low-rank mdps

A Modi, J Chen, A Krishnamurthy, N Jiang… - Journal of Machine …, 2024 - jmlr.org
The low-rank MDP has emerged as an important model for studying representation learning
and exploration in reinforcement learning. With a known representation, several model-free …

Computationally efficient horizon-free reinforcement learning for linear mixture mdps

D Zhou, Q Gu - Advances in neural information processing …, 2022 - proceedings.neurips.cc
Recent studies have shown that episodic reinforcement learning (RL) is not more difficult
than bandits, even with a long planning horizon and unknown state transitions. However …

Unified algorithms for rl with decision-estimation coefficients: No-regret, pac, and reward-free learning

F Chen, S Mei, Y Bai - arXiv preprint arXiv:2209.11745, 2022 - arxiv.org
Finding unified complexity measures and algorithms for sample-efficient learning is a central
topic of research in reinforcement learning (RL). The Decision-Estimation Coefficient (DEC) …

Policy finetuning in reinforcement learning via design of experiments using offline data

R Zhang, A Zanette - Advances in Neural Information …, 2024 - proceedings.neurips.cc
In some applications of reinforcement learning, a dataset of pre-collected experience is
already availablebut it is also possible to acquire some additional online data to help …

Horizon-free reinforcement learning in polynomial time: the power of stationary policies

Z Zhang, X Ji, S Du - Conference on Learning Theory, 2022 - proceedings.mlr.press
This paper gives the first polynomial-time algorithm for tabular Markov Decision Processes
(MDP) that enjoys a regret bound\emph {independent on the planning horizon}. Specifically …

Sample-efficient reinforcement learning with loglog (t) switching cost

D Qiao, M Yin, M Min, YX Wang - … Conference on Machine …, 2022 - proceedings.mlr.press
We study the problem of reinforcement learning (RL) with low (policy) switching cost {—} a
problem well-motivated by real-life RL applications in which deployments of new policies …

Beyond no regret: Instance-dependent pac reinforcement learning

AJ Wagenmaker, M Simchowitz… - … on Learning Theory, 2022 - proceedings.mlr.press
The theory of reinforcement learning has focused on two fundamental problems: achieving
low regret, and identifying $\epsilon $-optimal policies. While a simple reduction allows one …