Insights in reinforcement learning

H Van Hasselt, A Guez, D Silver - … of the AAAI conference on artificial …, 2016 - ojs.aaai.org

The popular Q-learning algorithm is known to overestimate action values under certain
conditions. It was not previously known whether, in practice, such overestimations are …

被引用次数：9490 相关文章所有 17 个版本

[PDF] washington.edu

[PDF][PDF] PILCO: A model-based and data-efficient approach to policy search

M Deisenroth, CE Rasmussen - Proceedings of the 28th …, 2011 - aiweb.cs.washington.edu

In this paper, we introduce pilco, a practical, data-efficient model-based policy search
method. Pilco reduces model bias, one of the key problems of model-based reinforcement …

被引用次数：2000 相关文章所有 28 个版本

[PDF] academia.edu

[图书][B] Reinforcement learning: An introduction

RS Sutton, AG Barto - 2018 - books.google.com

The significantly expanded and updated new edition of a widely used text on reinforcement
learning, one of the most active research areas in artificial intelligence. Reinforcement …

被引用次数：73877 相关文章所有 54 个版本

[PDF] neurips.cc

Learning values across many orders of magnitude

HP Van Hasselt, A Guez, M Hessel… - Advances in neural …, 2016 - proceedings.neurips.cc

Most learning algorithms are not invariant to the scale of the signal that is being
approximated. We propose to adaptively normalize the targets used in the learning updates …

被引用次数：195 相关文章所有 5 个版本

[PDF] hadovanhasselt.com

Reinforcement learning in continuous state and action spaces

H Van Hasselt - Reinforcement Learning: State-of-the-Art, 2012 - Springer

Many traditional reinforcement-learning algorithms have been designed for problems with
small finite state and action spaces. Learning in such discrete problems can been difficult …

被引用次数：291 相关文章所有 10 个版本

[PDF] arxiv.org

Deep reinforcement learning for clinical decision support: a brief survey

S Liu, KY Ngiam, M Feng - arXiv preprint arXiv:1907.09475, 2019 - arxiv.org

Owe to the recent advancements in Artificial Intelligence especially deep learning, many
data-driven decision support systems have been implemented to facilitate medical doctors in …

被引用次数：29 相关文章所有 2 个版本

[PDF] openrepository.com

Towards 5G: A reinforcement learning-based scheduling solution for data traffic management

IS Comşa, S Zhang, ME Aydin… - … on Network and …, 2018 - ieeexplore.ieee.org

Dominated by delay-sensitive and massive data applications, radio resource management
in 5G access networks is expected to satisfy very stringent delay and packet loss …

被引用次数：105 相关文章所有 14 个版本

[PDF] ijcai.org

[PDF][PDF] Weighted double Q-learning.

Z Zhang, Z Pan, MJ Kochenderfer - IJCAI, 2017 - ijcai.org

Q-learning is a popular reinforcement learning algorithm, but it can perform poorly in
stochastic environments due to overestimating action values. Overestimation is due to the …

被引用次数：111 相关文章所有 3 个版本

[PDF] smu.edu.sg

Reducing estimation bias via triplet-average deep deterministic policy gradient

D Wu, X Dong, J Shen, SCH Hoi - IEEE transactions on neural …, 2020 - ieeexplore.ieee.org

The overestimation caused by function approximation is a well-known property in Q-learning
algorithms, especially in single-critic models, which leads to poor performance in practical …

被引用次数：70 相关文章所有 4 个版本

[PDF] mlr.press

Unifying task specification in reinforcement learning

M White - International Conference on Machine Learning, 2017 - proceedings.mlr.press

Reinforcement learning tasks are typically specified as Markov decision processes. This
formalism has been highly successful, though specifications often couple the dynamics of …

被引用次数：114 相关文章所有 6 个版本

高级搜索

QQ 群