Finite-time analysis of single-timescale actor-critic

X Chen, L Zhao - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Actor-critic methods have achieved significant success in many challenging applications.
However, its finite-time convergence is still poorly understood in the most practical single …

Reinventing Policy Iteration under Time Inconsistency

NS Lesmana, H Su, CS Pun - Transactions on Machine Learning …, 2022 - openreview.net
Policy iteration (PI) is a fundamental policy search algorithm in standard reinforcement
learning (RL) setting, which can be shown to converge to an optimal policy by policy …

Q-learning with heterogeneous update strategy

T Tan, H Xie, L Feng - Information Sciences, 2024 - Elsevier
A variety of algorithms has been proposed to mitigate the overestimation bias of Q-learning.
These algorithms reduce the estimation of maximum Q-value, ie, homogeneous update. As …

Finite-Time Analysis of Simultaneous Double Q-learning

H Na, D Lee - arXiv preprint arXiv:2406.09946, 2024 - arxiv.org
$ Q $-learning is one of the most fundamental reinforcement learning (RL) algorithms.
Despite its widespread success in various applications, it is prone to overestimation bias in …

Finite-Time Analysis of Temporal Difference Learning: Discrete-Time Linear System Perspective

D Lee, DW Kim - arXiv preprint arXiv:2204.10479, 2022 - arxiv.org
TD-learning is a fundamental algorithm in the field of reinforcement learning (RL), that is
employed to evaluate a given policy by estimating the corresponding value function for a …

Value Bonuses Using Ensemble Errors For Exploration in Reinforcement Learning

A Wahab - 2024 - era.library.ualberta.ca
(RL). The agent acts greedily with respect to an estimate of the value plus what can be seen
as a value bonus. The value bonus can be learned by estimating a value function on reward …