Nearly minimax optimal reinforcement learning for linear mixture markov decision processes

D Zhou, Q Gu, C Szepesvari - Conference on Learning …, 2021 - proceedings.mlr.press
We study reinforcement learning (RL) with linear function approximation where the
underlying transition probability kernel of the Markov decision process (MDP) is a linear …

VOL: Towards Optimal Regret in Model-free RL with Nonlinear Function Approximation

A Agarwal, Y Jin, T Zhang - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
We study time-inhomogeneous episodic reinforcement learning (RL) under general function
approximation and sparse rewards. We design a new algorithm, Variance-weighted …

Nearly minimax optimal reinforcement learning with linear function approximation

P Hu, Y Chen, L Huang - International Conference on …, 2022 - proceedings.mlr.press
We study reinforcement learning with linear function approximation where the transition
probability and reward functions are linear with respect to a feature mapping $\boldsymbol …

Best policy identification in linear mdps

J Taupin, Y Jedra, A Proutiere - 2023 59th Annual Allerton …, 2023 - ieeexplore.ieee.org
We consider the problem of best policy identification in discounted Linear Markov Decision
Processes in the fixed confidence setting, under both generative and forward models. We …

Best policy identification in discounted linear mdps

J Taupin, Y Jedra, A Proutiere - Sixteenth European Workshop on …, 2023 - openreview.net
We consider the problem of best policy identification in discounted Linear Markov Decision
Processes in the fixed confidence setting, under both generative and forward models. We …

Statistical Learning in Linearly Structured Systems: Identification, Control, and Reinforcement Learning

Y Jedra - 2023 - diva-portal.org
In this thesis, we investigate the design and statistical efficiency of learning algorithms in
systems with a linear structure. This study is carried along three main domains, namely …

Double Q-learning: New Analysis and Sharper Finite-time Bound

L Zhao, H Xiong, Y Liang, W Zhang - openreview.net
Double Q-learning\citep {hasselt2010double} has gained significant success in practice due
to its effectiveness in overcoming the overestimation issue of Q-learning. However …