Variance-dependent regret bounds for linear bandits and reinforcement learning: Adaptivity and computational efficiency

H Zhao, J He, D Zhou, T Zhang… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
Recently, several studies\citep {zhou2021nearly, zhang2021variance, kim2021improved,
zhou2022computationally} have provided variance-dependent regret bounds for linear …

On the interplay between misspecification and sub-optimality gap in linear contextual bandits

W Zhang, J He, Z Fan, Q Gu - International Conference on …, 2023 - proceedings.mlr.press
We study linear contextual bandits in the misspecified setting, where the expected reward
function can be approximated by a linear function class up to a bounded misspecification …

Target Network and Truncation Overcome the Deadly Triad in -Learning

Z Chen, JP Clarke, ST Maguluri - SIAM Journal on Mathematics of Data …, 2023 - SIAM
learning with function approximation is one of the most empirically successful while
theoretically mysterious reinforcement learning (RL) algorithms and was identified in [RS …

A Doubly Robust Approach to Sparse Reinforcement Learning

W Kim, G Iyengar, A Zeevi - International Conference on …, 2024 - proceedings.mlr.press
We propose a new regret minimization algorithm for episodic sparse linear Markov decision
process (SMDP) where the state-transition distribution is a linear function of observed …

On the sample complexity of learning infinite-horizon discounted linear kernel MDPs

Y Chen, J He, Q Gu - International Conference on Machine …, 2022 - proceedings.mlr.press
We study reinforcement learning for infinite-horizon discounted linear kernel MDPs, where
the transition probability function is linear in a predefined feature mapping. Existing …

Towards Achieving Sub-linear Regret and Hard Constraint Violation in Model-free RL

A Ghosh, X Zhou, N Shroff - International Conference on …, 2024 - proceedings.mlr.press
We study the constrained Markov decision processes (CMDPs), in which an agent aims to
maximize the expected cumulative reward subject to a constraint on the expected total value …

Uniform-PAC guarantees for model-based RL with bounded eluder dimension

Y Wu, J He, Q Gu - Uncertainty in Artificial Intelligence, 2023 - proceedings.mlr.press
Recently, there has been remarkable progress in reinforcement learning (RL) with general
function approximation. However, all these works only provide regret or sample complexity …

Achieving Near-Optimal Regret for Bandit Algorithms with Uniform Last-Iterate Guarantee

J Liu, Y Li, L Yang - arXiv preprint arXiv:2402.12711, 2024 - arxiv.org
Existing performance measures for bandit algorithms such as regret, PAC bounds, or
uniform-PAC (Dann et al., 2017), typically evaluate the cumulative performance, while …

Settling Constant Regrets in Linear Markov Decision Processes

W Zhang, Z Fan, J He, Q Gu - arXiv preprint arXiv:2404.10745, 2024 - arxiv.org
We study the constant regret guarantees in reinforcement learning (RL). Our objective is to
design an algorithm that incurs only finite regret over infinite episodes with high probability …

Variance-Dependent Regret Bounds for Non-stationary Linear Bandits

Z Wang, J Xie, Y Chen, J Lui, D Zhou - arXiv preprint arXiv:2403.10732, 2024 - arxiv.org
We investigate the non-stationary stochastic linear bandit problem where the reward
distribution evolves each round. Existing algorithms characterize the non-stationarity by the …