Double gumbel q-learning

DYT Hui, AC Courville… - Advances in Neural …, 2023 - proceedings.neurips.cc
Abstract We show that Deep Neural Networks introduce two heteroscedastic Gumbel noise
sources into Q-Learning. To account for these noise sources, we propose Double Gumbel Q …

Reinforcement learning and collective cooperation on higher-order networks

Y Xu, J Wang, J Chen, D Zhao, M Özer, C Xia… - Knowledge-Based …, 2024 - Elsevier
Collective cooperation is essential for the survival and advancement of groups. However,
current studies on evolutionary dynamics within higher-order networks often focus on …

A Meta-learning Approach to Mitigating the Estimation Bias of Q-learning

T Tan, H Xie, X Shi, M Shang - ACM Transactions on Knowledge …, 2024 - dl.acm.org
It is a longstanding problem that Q-learning suffers from the overestimation bias. This issue
originates from the fact that Q-learning uses the expectation of maximum Q-value to …

Factors of Influence of the Overestimation Bias of Q-Learning

J Wagenbach, M Sabatelli - arXiv preprint arXiv:2210.05262, 2022 - arxiv.org
We study whether the learning rate $\alpha $, the discount factor $\gamma $ and the reward
signal $ r $ have an influence on the overestimation bias of the Q-Learning algorithm. Our …

Stabilizing Q-Learning for continuous control

DYT Hui - 2023 - papyrus.bib.umontreal.ca
Deep Reinforcement Learning has produced decision makers that play Chess, Go, Shogi,
Atari, and Starcraft with superhuman ability. However, unlike animals and humans, these …

[PDF][PDF] Adaptive Order Q-learning

T Tan, H Xie, D Lian - ijcai.org
This paper revisits the estimation bias control problem of Q-learning, motivated by the fact
that the estimation bias is not always evil, ie, some environments benefit from overestimation …