- 学术资源搜索

Byol-explore: Exploration by bootstrapped prediction

Z Guo, S Thakoor, M Pîslar… - Advances in neural …, 2022 - proceedings.neurips.cc

We present BYOL-Explore, a conceptually simple yet general approach for curiosity-driven
exploration in visually complex environments. BYOL-Explore learns the world …

被引用次数：72 相关文章所有 5 个版本

[PDF] mlr.press

Reward-free rl is no harder than reward-aware rl in linear markov decision processes

AJ Wagenmaker, Y Chen… - International …, 2022 - proceedings.mlr.press

Reward-free reinforcement learning (RL) considers the setting where the agent does not
have access to a reward function during exploration, but must propose a near-optimal policy …

被引用次数：67 相关文章所有 7 个版本

[PDF] mlr.press

VOL: Towards Optimal Regret in Model-free RL with Nonlinear Function Approximation

A Agarwal, Y Jin, T Zhang - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press

We study time-inhomogeneous episodic reinforcement learning (RL) under general function
approximation and sparse rewards. We design a new algorithm, Variance-weighted …

被引用次数：47 相关文章所有 5 个版本

[PDF] jmlr.org

Model-free representation learning and exploration in low-rank mdps

A Modi, J Chen, A Krishnamurthy, N Jiang… - Journal of Machine …, 2024 - jmlr.org

The low-rank MDP has emerged as an important model for studying representation learning
and exploration in reinforcement learning. With a known representation, several model-free …

被引用次数：95 相关文章所有 5 个版本

[PDF] neurips.cc

Computationally efficient horizon-free reinforcement learning for linear mixture mdps

D Zhou, Q Gu - Advances in neural information processing …, 2022 - proceedings.neurips.cc

Recent studies have shown that episodic reinforcement learning (RL) is not more difficult
than bandits, even with a long planning horizon and unknown state transitions. However …

被引用次数：43 相关文章所有 8 个版本

[PDF] arxiv.org

Unified algorithms for rl with decision-estimation coefficients: No-regret, pac, and reward-free learning

F Chen, S Mei, Y Bai - arXiv preprint arXiv:2209.11745, 2022 - arxiv.org

Finding unified complexity measures and algorithms for sample-efficient learning is a central
topic of research in reinforcement learning (RL). The Decision-Estimation Coefficient (DEC) …

被引用次数：38 相关文章所有 3 个版本

[PDF] neurips.cc

Policy finetuning in reinforcement learning via design of experiments using offline data

R Zhang, A Zanette - Advances in Neural Information …, 2024 - proceedings.neurips.cc

In some applications of reinforcement learning, a dataset of pre-collected experience is
already availablebut it is also possible to acquire some additional online data to help …

被引用次数：7 相关文章所有 6 个版本

[PDF] mlr.press

Horizon-free reinforcement learning in polynomial time: the power of stationary policies

Z Zhang, X Ji, S Du - Conference on Learning Theory, 2022 - proceedings.mlr.press

This paper gives the first polynomial-time algorithm for tabular Markov Decision Processes
(MDP) that enjoys a regret bound\emph {independent on the planning horizon}. Specifically …

被引用次数：29 相关文章所有 4 个版本

[PDF] mlr.press

Sample-efficient reinforcement learning with loglog (t) switching cost

D Qiao, M Yin, M Min, YX Wang - … Conference on Machine …, 2022 - proceedings.mlr.press

We study the problem of reinforcement learning (RL) with low (policy) switching cost {—} a
problem well-motivated by real-life RL applications in which deployments of new policies …

被引用次数：34 相关文章所有 6 个版本

[PDF] mlr.press

Beyond no regret: Instance-dependent pac reinforcement learning

AJ Wagenmaker, M Simchowitz… - … on Learning Theory, 2022 - proceedings.mlr.press

The theory of reinforcement learning has focused on two fundamental problems: achieving
low regret, and identifying $\epsilon $-optimal policies. While a simple reduction allows one …

被引用次数：40 相关文章所有 4 个版本

高级搜索

QQ 群

Byol-explore: Exploration by bootstrapped prediction

Reward-free rl is no harder than reward-aware rl in linear markov decision processes

VOL: Towards Optimal Regret in Model-free RL with Nonlinear Function Approximation

Model-free representation learning and exploration in low-rank mdps

Computationally efficient horizon-free reinforcement learning for linear mixture mdps

Unified algorithms for rl with decision-estimation coefficients: No-regret, pac, and reward-free learning

Policy finetuning in reinforcement learning via design of experiments using offline data

Horizon-free reinforcement learning in polynomial time: the power of stationary policies

Sample-efficient reinforcement learning with loglog (t) switching cost

Beyond no regret: Instance-dependent pac reinforcement learning

引用