[PDF][PDF] ISTANBUL TECHNICAL UNIVERSITY 击GRADUATE SCHOOL

ODVIADR LEARNING, Ö Ugur - 2022 - polen.itu.edu.tr
… For these comparisons, empirical analyzes were applied using … Q-learning is an off-policy
approach and it is a quite good … that we study, but these days are not included into the test set …

基于表征学习的离线强化学习方法研究综述

王雪松, 王荣荣, 程玉虎 - 自动化学报, 2024 - aas.net.cn
… the recent research on offline reinforcement learning based on representation … , offline policy
evaluation and model selection. Moreover, the study trends of offline reinforcement learning

深度强化学习综述

王浩楠, 刘苧, 章艺云, 冯大伟, 黄峰… - 信息与电子工程前沿 …, 2022 - fitee.zjujournals.com
… SAC combines off-policy updates with a stable stochastic AC … sample efficiency over off-policy
and on-policy prior methods, … : off-policy maximum entropy deep reinforcement learning

自监督脓毒症治疗推荐算法

S Zhu, J Pu, AS Zhu, AJ Pu - Frontiers, 2021 - jzus.zju.edu.cn
evaluation method is proposed to separate patient samples into two domains according to
their responses to treatments and the state value of the chosen policy. … reinforcement learning

多重代理人之策略競爭遊戲之強化學習方法

王宇軒 - 2019 - thuir.lib.thu.edu.tw
Policy Gradient 的算法,其基本概念便是把 Policy Gradient 從On-Policy 改成Off-Policy 型態,其中
,所謂OnPolicy … 中採取的行為和反應進 行更新;而Off-Policy 則是使用另一個代理儘可能的收集…

基于时间差分误差的离线强化学习采样策略

张龙飞, 冯旸赫, 梁星星, 刘世旋, 程光权, 黄金才 - 工程科学学报, 2023 - cje.ustb.edu.cn
… data or other empirical data to learn action strategies offline … sampling strategy for offline
reinforcement learning based on TD-… 为此,研究者们提出离策略(Off-Policy) 强化学习方法,比如异步…

[PDF][PDF] ISTANBUL TECHNICAL UNIVERSITY 击GRADUATE SCHOOL

E ÇAKMAK - 2023 - polen.itu.edu.tr
… presented study is to show that reinforcement learning based … The most preferred model-free
reinforcement algorithms in … the main aspects of DDPG is that it is an off-policy algorithm. …

[PDF][PDF] 深度强化学习综述: 兼论计算机围棋的发展

赵冬斌, 邵坤, 朱圆恒, 李栋, 陈亚冉, 王海涛… - 控制理论与 …, 2016 - researchgate.net
reinforcement learning, reviews the history of computer Go concurrently, analyzes the
algorithms features, and discusses the research … 蒙特卡罗方法同时还可以与离策 略(off-policy)的…

基于深度强化学习的双足机器人斜坡步态控制方法

吴晓光, 刘绍维, 杨磊, 邓文强, 贾哲恒 - 自动化学报, 2021 - aas.net.cn
… passive biped robot based on deep reinforcement learning. By analyzing the hybrid dynamics
learning of quasi-passive dynamic walking by an unstable biped robot based on off-policy

基于带积分补偿近端策略优化算法的四旋翼控制

胡欢, 王庆领 - 信息与电子工程前沿(英文), 2022 - fitee.zjujournals.com
… proximal policy optimization (PPO) reinforcement learning … is the result of off-policy learning,
TD-learning, and nonlinear function … When the empirical data in the buffer reaches the set …