Reinforcement learning with deep energy-based policies

T Haarnoja, H Tang, P Abbeel… - … conference on machine …, 2017 - proceedings.mlr.press
Deep reinforcement learning (deep RL) has emerged as a … entropy policies with approximate
inference for reinforcement … Our reinforcement learning problem can be defined as policy

Approximate policy-based accelerated deep reinforcement learning

X Wang, Y Gu, Y Cheng, A Liu… - IEEE Transactions on …, 2019 - ieeexplore.ieee.org
… The proposed APA is proven to be convergent even with a more aggressive learning rate, …
algorithm with deep Q-network (DQN), Double DQN and deep deterministic policy gradient (…

Illuminating generalization in deep reinforcement learning through procedural level generation

N Justesen, RR Torrado, P Bontrager, A Khalifa… - arXiv preprint arXiv …, 2018 - arxiv.org
… in deep reinforcement learning research (eg the Arcade Learning Environment [3]). Our
findings suggest that policies … on a higher level, to the distribution of generated levels presented …

What matters in on-policy reinforcement learning? a large-scale empirical study

M Andrychowicz, A Raichuk, P Stańczyk… - arXiv preprint arXiv …, 2020 - arxiv.org
Deep reinforcement learning (RL) has seen increased interest in recent years due to its …
the importance of hyperparameter tuning and the high level of stochasticity due to random seeds…

Benchmarking deep reinforcement learning for continuous control

Y Duan, X Chen, R Houthooft… - International …, 2016 - proceedings.mlr.press
policies, we use the notation µθ : S→A to denote the policy … , where higher level decisions
can reuse lower level skills (… where both lowlevel motor controls and high-level decisions are …

How to discount deep reinforcement learning: Towards new dynamic strategies

V François-Lavet, R Fonteneau, D Ernst - arXiv preprint arXiv:1512.02011, 2015 - arxiv.org
… Following the ideas discussed above, Figure 7 represents the general update scheme that
we propose to further improve the performance of deep reinforcement learning algorithms. …

Supervised policy update for deep reinforcement learning

Q Vuong, Y Zhang, KW Ross - arXiv preprint arXiv:1805.11706, 2018 - arxiv.org
Policy Update (SPU), for deep reinforcement learning. Starting with data generated by the
current policy… in the non-parameterized proximal policy space. Using supervised regression, it …

Automated lane change strategy using proximal policy optimization-based deep reinforcement learning

F Ye, X Cheng, P Wang, CY Chan… - 2020 IEEE Intelligent …, 2020 - ieeexplore.ieee.org
… Using SUMO and its associated traffic control interface (TraCI), we can access the vehicle
information in the road network, and execute highlevel decisions in the learner model and take …

Hierarchical deep reinforcement learning for continuous action control

Z Yang, K Merrick, L Jin… - IEEE transactions on …, 2018 - ieeexplore.ieee.org
… -deep deterministic policy gradient (h-DDPG). The proposed algorithm comprises two levels
of … learning methods to further improve the performance of top level hierarchy (which they …

Accelerated methods for deep reinforcement learning

A Stooke, P Abbeel - arXiv preprint arXiv:1803.02811, 2018 - arxiv.org
deep RL algorithms for modern computers, specifically for a combination of CPUs and GPUs.
We confirm that both policy … DGX-1 to learn successful strategies in Atari games in mere …