深度强化学习综述

王浩楠, 刘苧, 章艺云, 冯大伟, 黄峰… - 信息与电子工程前沿 …, 2022 - fitee.zjujournals.com
… but efficient design of actor critic methods to both continuous and … ous states and action
spaces or meet the requirements for effective … Soft actorcritic: off-policy maximum entropy deep …