Mastering the game of go without human knowledge

D Silver, J Schrittwieser, K Simonyan, I Antonoglou… - nature, 2017 - nature.com
A long-standing goal of artificial intelligence is an algorithm that learns, tabula rasa,
superhuman proficiency in challenging domains. Recently, AlphaGo became the first …

[PDF][PDF] Trust Region Policy Optimization

J Schulman - arXiv preprint arXiv:1502.05477, 2015 - people.engr.tamu.edu
In this article, we describe a method for optimizing control policies, with guaranteed
monotonic improvement. By making several approximations to the theoretically-justified …

Deep reinforcement learning in large discrete action spaces

G Dulac-Arnold, R Evans, H van Hasselt… - arXiv preprint arXiv …, 2015 - arxiv.org
Being able to reason in an environment with a large number of discrete actions is essential
to bringing reinforcement learning to a larger class of problems. Recommender systems …

Learn what not to learn: Action elimination with deep reinforcement learning

T Zahavy, M Haroush, N Merlis… - Advances in neural …, 2018 - proceedings.neurips.cc
Learning how to act when there are many available actions in each state is a challenging
task for Reinforcement Learning (RL) agents, especially when many of the actions are …

Adversarial environment reinforcement learning algorithm for intrusion detection

G Caminero, M Lopez-Martin, B Carro - Computer Networks, 2019 - Elsevier
Intrusion detection is a crucial service in today's data networks, and the search for new fast
and robust algorithms that are capable of detecting and classifying dangerous traffic is …

[图书][B] Reinforcement learning and dynamic programming using function approximators

L Busoniu, R Babuska, B De Schutter, D Ernst - 2017 - taylorfrancis.com
From household appliances to applications in robotics, engineered systems involving
complex dynamics can only be as effective as the algorithms that control them. While …

[PDF][PDF] Tree-based batch mode reinforcement learning

D Ernst, P Geurts, L Wehenkel - Journal of Machine Learning Research, 2005 - jmlr.org
Reinforcement learning aims to determine an optimal control policy from interaction with a
system or from observations gathered from a system. In batch mode, it can be achieved by …

Approximate reinforcement learning: An overview

L Buşoniu, D Ernst, B De Schutter… - 2011 IEEE symposium …, 2011 - ieeexplore.ieee.org
Reinforcement learning (RL) allows agents to learn how to optimally interact with complex
environments. Fueled by recent advances in approximation-based algorithms, RL has …

On the role of planning in model-based deep reinforcement learning

JB Hamrick, AL Friesen, F Behbahani, A Guez… - arXiv preprint arXiv …, 2020 - arxiv.org
Model-based planning is often thought to be necessary for deep, careful reasoning and
generalization in artificial agents. While recent successes of model-based reinforcement …

Multiagent reinforcement learning: Rollout and policy iteration

D Bertsekas - IEEE/CAA Journal of Automatica Sinica, 2021 - ieeexplore.ieee.org
We discuss the solution of complex multistage decision problems using methods that are
based on the idea of policy iteration (PI), ie, start from some base policy and generate an …