A comprehensive survey on machine learning for networking: evolution, applications and research opportunities

R Boutaba, MA Salahuddin, N Limam, S Ayoubi… - Journal of Internet …, 2018 - Springer
Abstract Machine Learning (ML) has been enjoying an unprecedented surge in applications
that solve problems and enable automation in diverse domains. Primarily, this is due to the …

[图书][B] Control systems and reinforcement learning

S Meyn - 2022 - books.google.com
A high school student can create deep Q-learning code to control her robot, without any
understanding of the meaning of'deep'or'Q', or why the code sometimes fails. This book is …

Batch reinforcement learning

S Lange, T Gabel, M Riedmiller - Reinforcement learning: State-of-the-art, 2012 - Springer
Batch reinforcement learning is a subfield of dynamic programming-based reinforcement
learning. Originally defined as the task of learning the best possible policy from a fixed set of …

[PDF][PDF] Coordinated reinforcement learning

C Guestrin, M Lagoudakis, R Parr - ICML, 2002 - Citeseer
We present several new algorithms for multiagent reinforcement learning. A common feature
of these algorithms is a parameterized, structured representation of a policy or value …

High confidence policy improvement

P Thomas, G Theocharous… - … on Machine Learning, 2015 - proceedings.mlr.press
We present a batch reinforcement learning (RL) algorithm that provides probabilistic
guarantees about the quality of each policy that it proposes, and which has no hyper …

[PDF][PDF] Reinforcement learning for humanoid robotics

J Peters, S Vijayakumar… - Proceedings of the …, 2003 - ias.informatik.tu-darmstadt.de
Reinforcement learning offers one of the most general framework to take traditional robotics
towards true autonomy and versatility. However, applying reinforcement learning to high …

[PDF][PDF] Error bounds for approximate policy iteration

R Munos - ICML, 2003 - Citeseer
Error Bounds for Approximate Policy Iteration RÚmi Munos, Page 1 Error Bounds for Approximate
Policy Iteration RÚmi Munos, Centre de MathÚmatiques AppliquÚes, Ecole Polytechnique …

Basis function adaptation in temporal difference reinforcement learning

I Menache, S Mannor, N Shimkin - Annals of Operations Research, 2005 - Springer
Reinforcement Learning (RL) is an approach for solving complex multi-stage decision
problems that fall under the general framework of Markov Decision Problems (MDPs), with …

Bias in natural actor-critic algorithms

P Thomas - International conference on machine learning, 2014 - proceedings.mlr.press
We show that several popular discounted reward natural actor-critics, including the popular
NAC-LSTD and eNAC algorithms, do not generate unbiased estimates of the natural policy …

[PDF][PDF] Reinforcement learning as classification: Leveraging modern classifiers

MG Lagoudakis, R Parr - … of the 20th International Conference on …, 2003 - cdn.aaai.org
The basic tools of machine learning appear in the inner loop of most reinforcement learning
algorithms, typically in the form of Monte Carlo methods or function approximation …