Deep reinforcement learning with double q-learning

H Van Hasselt, A Guez, D Silver - … of the AAAI conference on artificial …, 2016 - ojs.aaai.org
The popular Q-learning algorithm is known to overestimate action values under certain
conditions. It was not previously known whether, in practice, such overestimations are …

[PDF][PDF] PILCO: A model-based and data-efficient approach to policy search

M Deisenroth, CE Rasmussen - Proceedings of the 28th …, 2011 - aiweb.cs.washington.edu
In this paper, we introduce pilco, a practical, data-efficient model-based policy search
method. Pilco reduces model bias, one of the key problems of model-based reinforcement …

[图书][B] Reinforcement learning: An introduction

RS Sutton, AG Barto - 2018 - books.google.com
The significantly expanded and updated new edition of a widely used text on reinforcement
learning, one of the most active research areas in artificial intelligence. Reinforcement …

Learning values across many orders of magnitude

HP Van Hasselt, A Guez, M Hessel… - Advances in neural …, 2016 - proceedings.neurips.cc
Most learning algorithms are not invariant to the scale of the signal that is being
approximated. We propose to adaptively normalize the targets used in the learning updates …

Reinforcement learning in continuous state and action spaces

H Van Hasselt - Reinforcement Learning: State-of-the-Art, 2012 - Springer
Many traditional reinforcement-learning algorithms have been designed for problems with
small finite state and action spaces. Learning in such discrete problems can been difficult …

Deep reinforcement learning for clinical decision support: a brief survey

S Liu, KY Ngiam, M Feng - arXiv preprint arXiv:1907.09475, 2019 - arxiv.org
Owe to the recent advancements in Artificial Intelligence especially deep learning, many
data-driven decision support systems have been implemented to facilitate medical doctors in …

Towards 5G: A reinforcement learning-based scheduling solution for data traffic management

IS Comşa, S Zhang, ME Aydin… - … on Network and …, 2018 - ieeexplore.ieee.org
Dominated by delay-sensitive and massive data applications, radio resource management
in 5G access networks is expected to satisfy very stringent delay and packet loss …

[PDF][PDF] Weighted double Q-learning.

Z Zhang, Z Pan, MJ Kochenderfer - IJCAI, 2017 - ijcai.org
Q-learning is a popular reinforcement learning algorithm, but it can perform poorly in
stochastic environments due to overestimating action values. Overestimation is due to the …

Reducing estimation bias via triplet-average deep deterministic policy gradient

D Wu, X Dong, J Shen, SCH Hoi - IEEE transactions on neural …, 2020 - ieeexplore.ieee.org
The overestimation caused by function approximation is a well-known property in Q-learning
algorithms, especially in single-critic models, which leads to poor performance in practical …

Unifying task specification in reinforcement learning

M White - International Conference on Machine Learning, 2017 - proceedings.mlr.press
Reinforcement learning tasks are typically specified as Markov decision processes. This
formalism has been highly successful, though specifications often couple the dynamics of …