NS Lesmana, H Su, CS Pun - Transactions on Machine Learning …, 2022 - openreview.net
Policy iteration (PI) is a fundamental policy search algorithm in standard reinforcement learning (RL) setting, which can be shown to converge to an optimal policy by policy …
T Tan, H Xie, L Feng - Information Sciences, 2024 - Elsevier
A variety of algorithms has been proposed to mitigate the overestimation bias of Q-learning. These algorithms reduce the estimation of maximum Q-value, ie, homogeneous update. As …
H Na, D Lee - arXiv preprint arXiv:2406.09946, 2024 - arxiv.org
$ Q $-learning is one of the most fundamental reinforcement learning (RL) algorithms. Despite its widespread success in various applications, it is prone to overestimation bias in …
D Lee, DW Kim - arXiv preprint arXiv:2204.10479, 2022 - arxiv.org
TD-learning is a fundamental algorithm in the field of reinforcement learning (RL), that is employed to evaluate a given policy by estimating the corresponding value function for a …
(RL). The agent acts greedily with respect to an estimate of the value plus what can be seen as a value bonus. The value bonus can be learned by estimating a value function on reward …