Model-free least-squares policy iteration

A comprehensive survey on machine learning for networking: evolution, applications and research opportunities

R Boutaba, MA Salahuddin, N Limam, S Ayoubi… - Journal of Internet …, 2018 - Springer

Abstract Machine Learning (ML) has been enjoying an unprecedented surge in applications
that solve problems and enable automation in diverse domains. Primarily, this is due to the …

被引用次数：1252 相关文章所有 15 个版本

[图书][B] Control systems and reinforcement learning

S Meyn - 2022 - books.google.com

A high school student can create deep Q-learning code to control her robot, without any
understanding of the meaning of'deep'or'Q', or why the code sometimes fails. This book is …

被引用次数：156 相关文章所有 3 个版本

[PDF] machine-learning-lab.com

Batch reinforcement learning

S Lange, T Gabel, M Riedmiller - Reinforcement learning: State-of-the-art, 2012 - Springer

Batch reinforcement learning is a subfield of dynamic programming-based reinforcement
learning. Originally defined as the task of learning the best possible policy from a fixed set of …

被引用次数：809 相关文章所有 9 个版本

[PDF] psu.edu

[PDF][PDF] Coordinated reinforcement learning

C Guestrin, M Lagoudakis, R Parr - ICML, 2002 - Citeseer

We present several new algorithms for multiagent reinforcement learning. A common feature
of these algorithms is a parameterized, structured representation of a policy or value …

被引用次数：547 相关文章所有 27 个版本

[PDF] mlr.press

High confidence policy improvement

P Thomas, G Theocharous… - … on Machine Learning, 2015 - proceedings.mlr.press

We present a batch reinforcement learning (RL) algorithm that provides probabilistic
guarantees about the quality of each policy that it proposes, and which has no hyper …

被引用次数：225 相关文章所有 8 个版本

[PDF] tu-darmstadt.de

[PDF][PDF] Reinforcement learning for humanoid robotics

J Peters, S Vijayakumar… - Proceedings of the …, 2003 - ias.informatik.tu-darmstadt.de

Reinforcement learning offers one of the most general framework to take traditional robotics
towards true autonomy and versatility. However, applying reinforcement learning to high …

被引用次数：537 相关文章所有 16 个版本

[PDF] psu.edu

[PDF][PDF] Error bounds for approximate policy iteration

R Munos - ICML, 2003 - Citeseer

Error Bounds for Approximate Policy Iteration RÚmi Munos, Page 1 Error Bounds for Approximate
Policy Iteration RÚmi Munos, Centre de MathÚmatiques AppliquÚes, Ecole Polytechnique …

被引用次数：361 相关文章所有 4 个版本

[PDF] psu.edu

Basis function adaptation in temporal difference reinforcement learning

I Menache, S Mannor, N Shimkin - Annals of Operations Research, 2005 - Springer

Reinforcement Learning (RL) is an approach for solving complex multi-stage decision
problems that fall under the general framework of Markov Decision Problems (MDPs), with …

被引用次数：258 相关文章所有 12 个版本

[PDF] mlr.press

Bias in natural actor-critic algorithms

P Thomas - International conference on machine learning, 2014 - proceedings.mlr.press

We show that several popular discounted reward natural actor-critics, including the popular
NAC-LSTD and eNAC algorithms, do not generate unbiased estimates of the natural policy …

被引用次数：170 相关文章所有 10 个版本

[PDF] aaai.org

[PDF][PDF] Reinforcement learning as classification: Leveraging modern classifiers

MG Lagoudakis, R Parr - … of the 20th International Conference on …, 2003 - cdn.aaai.org

The basic tools of machine learning appear in the inner loop of most reinforcement learning
algorithms, typically in the form of Monte Carlo methods or function approximation …

被引用次数：222 相关文章所有 7 个版本

高级搜索

QQ 群