Deep reinforcement learning: A brief survey

K Arulkumaran, MP Deisenroth… - IEEE Signal …, 2017 - ieeexplore.ieee.org
… , deep learning is enabling reinforcement learning (RL) to scale to problems that were previously
intractable, such as learning to … policy iteration, where policy iteration consists of policy

Reinforcement learning with deep energy-based policies

T Haarnoja, H Tang, P Abbeel… - … on machine learning, 2017 - proceedings.mlr.press
… In this section, we will define the reinforcement learning problem that we are addressing and
briefly summarize the maximum entropy policy … Maximum Entropy Reinforcement Learning

Edge: Explaining deep reinforcement learning policies

W Guo, X Wu, U Khan, X Xing - Advances in Neural …, 2021 - proceedings.neurips.cc
… Deep reinforcement learning has shown great success in automatic policy learning for …
policies, whereas our method is applicable to DRL policies with arbitrary network structures. …

Deep reinforcement learning: An overview

Y Li - arXiv preprint arXiv:1701.07274, 2017 - arxiv.org
… background of machine learning, deep learning and reinforcement learning in Section 2.
Next we discuss core RL elements, including value function in Section 3.1, policy in Section 3.2…

Learning curriculum policies for reinforcement learning

S Narvekar, P Stone - arXiv preprint arXiv:1812.00285, 2018 - arxiv.org
… However, as the problems we task reinforcement learning agents with become ever more
complex, it may be beneficial (and even necessary) to gradually acquire skills over multiple …

Continuous control with deep reinforcement learning

TP Lillicrap, JJ Hunt, A Pritzel, N Heess, T Erez… - arXiv preprint arXiv …, 2015 - arxiv.org
… on the policy π, and may be stochastic. The goal in reinforcement learning is to learn a policy
which maximizes the expected return from the start distribution J = Eri,si∼E,ai∼π [R1]. We …

Fast reinforcement learning with generalized policy updates

A Barreto, S Hou, D Borsa, D Silver… - Proceedings of the …, 2020 - National Acad Sciences
… within the standard reinforcement-learning formalism. The … in reinforcement learning: policy
improvement and policy … , we can reduce a reinforcement-learning problem to a simpler …

Introduction to reinforcement learning

Z Ding, Y Huang, H Yuan, H Dong - Deep reinforcement learning …, 2020 - Springer
… given policy π, over the sampled trajectories guided by the policy. We call this “on-policy
manner as in reinforcement learning the policy … is conditioned on or estimated by current policy. …

Model-based reinforcement learning for atari

L Kaiser, M Babaeizadeh, P Milos, B Osinski… - arXiv preprint arXiv …, 2019 - arxiv.org
… (SimPLe), that utilizes these video prediction techniques and trains a policy to play the … ,
where the policy is deployed to collect more data in the original game, we learn a policy that, for …

Time limits in reinforcement learning

F Pardo, A Tavakoli, V Levdik… - … on Machine Learning, 2018 - proceedings.mlr.press
… state-values and policies learned by tabular Q-learning overlaid on our … policies that are
limited to a fraction of the state space. In Section 3, we show that in order to learn good policies