Uniform-pac bounds for reinforcement learning with linear function approximation

Variance-dependent regret bounds for linear bandits and reinforcement learning: Adaptivity and computational efficiency

H Zhao, J He, D Zhou, T Zhang… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press

Recently, several studies\citep {zhou2021nearly, zhang2021variance, kim2021improved,
zhou2022computationally} have provided variance-dependent regret bounds for linear …

被引用次数：30 相关文章所有 5 个版本

[PDF] mlr.press

On the interplay between misspecification and sub-optimality gap in linear contextual bandits

W Zhang, J He, Z Fan, Q Gu - International Conference on …, 2023 - proceedings.mlr.press

We study linear contextual bandits in the misspecified setting, where the expected reward
function can be approximated by a linear function class up to a bounded misspecification …

被引用次数：10 相关文章所有 7 个版本

[PDF] arxiv.org

Target Network and Truncation Overcome the Deadly Triad in -Learning

Z Chen, JP Clarke, ST Maguluri - SIAM Journal on Mathematics of Data …, 2023 - SIAM

learning with function approximation is one of the most empirically successful while
theoretically mysterious reinforcement learning (RL) algorithms and was identified in [RS …

被引用次数：23 相关文章所有 3 个版本

[PDF] mlr.press

A Doubly Robust Approach to Sparse Reinforcement Learning

W Kim, G Iyengar, A Zeevi - International Conference on …, 2024 - proceedings.mlr.press

We propose a new regret minimization algorithm for episodic sparse linear Markov decision
process (SMDP) where the state-transition distribution is a linear function of observed …

被引用次数：2 相关文章所有 3 个版本

[PDF] mlr.press

On the sample complexity of learning infinite-horizon discounted linear kernel MDPs

Y Chen, J He, Q Gu - International Conference on Machine …, 2022 - proceedings.mlr.press

We study reinforcement learning for infinite-horizon discounted linear kernel MDPs, where
the transition probability function is linear in a predefined feature mapping. Existing …

被引用次数：8 相关文章所有 4 个版本

[PDF] mlr.press

Towards Achieving Sub-linear Regret and Hard Constraint Violation in Model-free RL

A Ghosh, X Zhou, N Shroff - International Conference on …, 2024 - proceedings.mlr.press

We study the constrained Markov decision processes (CMDPs), in which an agent aims to
maximize the expected cumulative reward subject to a constraint on the expected total value …

被引用次数：2 相关文章

[PDF] mlr.press

Uniform-PAC guarantees for model-based RL with bounded eluder dimension

Y Wu, J He, Q Gu - Uncertainty in Artificial Intelligence, 2023 - proceedings.mlr.press

Recently, there has been remarkable progress in reinforcement learning (RL) with general
function approximation. However, all these works only provide regret or sample complexity …

被引用次数：4 相关文章所有 7 个版本

[PDF] arxiv.org

高级搜索

QQ 群