Novel advanced policy gradient (APG) methods, such as Trust Region policy optimization and Proximal policy optimization (PPO), have become the dominant reinforcement learning …
JG Dai, M Gluzman - arXiv e-prints, 2020 - ui.adsabs.harvard.edu
Novel advanced policy gradient (APG) methods, such as Trust Region policy optimization and Proximal policy optimization (PPO), have become the dominant reinforcement learning …
For more than 30 years, one of the most difficult problems in applied probability and operations research is to find a scalable algorithm for approximately solving the optimal …