reinforcement learning policies- 学术资源搜索

Deep reinforcement learning: A brief survey

K Arulkumaran, MP Deisenroth… - IEEE Signal …, 2017 - ieeexplore.ieee.org

… , deep learning is enabling reinforcement learning (RL) to scale to problems that were previously
intractable, such as learning to … policy iteration, where policy iteration consists of policy …

被引用次数：3211 相关文章所有 6 个版本

[PDF] mlr.press

Reinforcement learning with deep energy-based policies

T Haarnoja, H Tang, P Abbeel… - … on machine learning, 2017 - proceedings.mlr.press

… In this section, we will define the reinforcement learning problem that we are addressing and
briefly summarize the maximum entropy policy … Maximum Entropy Reinforcement Learning …

被引用次数：1401 相关文章所有 5 个版本

[PDF] neurips.cc

Edge: Explaining deep reinforcement learning policies

W Guo, X Wu, U Khan, X Xing - Advances in Neural …, 2021 - proceedings.neurips.cc

… Deep reinforcement learning has shown great success in automatic policy learning for …
policies, whereas our method is applicable to DRL policies with arbitrary network structures. …

被引用次数：59 相关文章所有 7 个版本

[PDF] arxiv.org

Deep reinforcement learning: An overview

Y Li - arXiv preprint arXiv:1701.07274, 2017 - arxiv.org

… background of machine learning, deep learning and reinforcement learning in Section 2.
Next we discuss core RL elements, including value function in Section 3.1, policy in Section 3.2…

被引用次数：1777 相关文章所有 6 个版本

相关搜索

[PDF] arxiv.org

Learning curriculum policies for reinforcement learning

S Narvekar, P Stone - arXiv preprint arXiv:1812.00285, 2018 - arxiv.org

… However, as the problems we task reinforcement learning agents with become ever more
complex, it may be beneficial (and even necessary) to gradually acquire skills over multiple …

被引用次数：111 相关文章所有 7 个版本

[PDF] arxiv.org

Continuous control with deep reinforcement learning

TP Lillicrap, JJ Hunt, A Pritzel, N Heess, T Erez… - arXiv preprint arXiv …, 2015 - arxiv.org

… on the policy π, and may be stochastic. The goal in reinforcement learning is to learn a policy
which maximizes the expected return from the start distribution J = Eri,si∼E,ai∼π [R1]. We …

被引用次数：16072 相关文章所有 15 个版本

[PDF] pnas.org Full View

Fast reinforcement learning with generalized policy updates

A Barreto, S Hou, D Borsa, D Silver… - Proceedings of the …, 2020 - National Acad Sciences

… within the standard reinforcement-learning formalism. The … in reinforcement learning: policy
improvement and policy … , we can reduce a reinforcement-learning problem to a simpler …

被引用次数：136 相关文章所有 8 个版本

[PDF] wisc.edu

Introduction to reinforcement learning

Z Ding, Y Huang, H Yuan, H Dong - Deep reinforcement learning …, 2020 - Springer

… given policy π, over the sampled trajectories guided by the policy. We call this “on-policy”
manner as in reinforcement learning the policy … is conditioned on or estimated by current policy. …

被引用次数：108 相关文章所有 5 个版本

[PDF] arxiv.org

Model-based reinforcement learning for atari

L Kaiser, M Babaeizadeh, P Milos, B Osinski… - arXiv preprint arXiv …, 2019 - arxiv.org

… (SimPLe), that utilizes these video prediction techniques and trains a policy to play the … ,
where the policy is deployed to collect more data in the original game, we learn a policy that, for …

被引用次数：910 相关文章所有 6 个版本

[PDF] mlr.press

Time limits in reinforcement learning

F Pardo, A Tavakoli, V Levdik… - … on Machine Learning, 2018 - proceedings.mlr.press

… state-values and policies learned by tabular Q-learning overlaid on our … policies that are
limited to a fraction of the state space. In Section 3, we show that in order to learn good policies …

被引用次数：160 相关文章所有 12 个版本

高级搜索

QQ 群

Deep reinforcement learning: A brief survey

Reinforcement learning with deep energy-based policies

Edge: Explaining deep reinforcement learning policies

Deep reinforcement learning: An overview

相关搜索

Learning curriculum policies for reinforcement learning

Continuous control with deep reinforcement learning

Fast reinforcement learning with generalized policy updates

Introduction to reinforcement learning

Model-based reinforcement learning for atari

Time limits in reinforcement learning

引用