Empirical study of off-policy policy evaluation for reinforcement learning- 学术资源搜索

[PDF][PDF] ISTANBUL TECHNICAL UNIVERSITY 击GRADUATE SCHOOL

ODVIADR LEARNING, Ö Ugur - 2022 - polen.itu.edu.tr

… For these comparisons, empirical analyzes were applied using … Q-learning is an off-policy
approach and it is a quite good … that we study, but these days are not included into the test set …

基于表征学习的离线强化学习方法研究综述

王雪松，王荣荣，程玉虎 - 自动化学报, 2024 - aas.net.cn

… the recent research on offline reinforcement learning based on representation … , offline policy
evaluation and model selection. Moreover, the study trends of offline reinforcement learning …

深度强化学习综述

王浩楠，刘苧，章艺云，冯大伟，黄峰… - 信息与电子工程前沿 …, 2022 - fitee.zjujournals.com

… SAC combines off-policy updates with a stable stochastic AC … sample efficiency over off-policy
and on-policy prior methods, … : off-policy maximum entropy deep reinforcement learning …

自监督脓毒症治疗推荐算法

S Zhu, J Pu, AS Zhu, AJ Pu - Frontiers, 2021 - jzus.zju.edu.cn

… evaluation method is proposed to separate patient samples into two domains according to
their responses to treatments and the state value of the chosen policy. … reinforcement learning …

多重代理人之策略競爭遊戲之強化學習方法

王宇軒 - 2019 - thuir.lib.thu.edu.tw

… Policy Gradient 的算法,其基本概念便是把 Policy Gradient 從On-Policy 改成Off-Policy 型態,其中
,所謂OnPolicy … 中採取的行為和反應進行更新;而Off-Policy 則是使用另一個代理儘可能的收集…

[PDF] ustb.edu.cn

基于时间差分误差的离线强化学习采样策略

张龙飞，冯旸赫，梁星星，刘世旋，程光权，黄金才 - 工程科学学报, 2023 - cje.ustb.edu.cn

… data or other empirical data to learn action strategies offline … sampling strategy for offline
reinforcement learning based on TD-… 为此,研究者们提出离策略(Off-Policy) 强化学习方法,比如异步…

[PDF][PDF] ISTANBUL TECHNICAL UNIVERSITY 击GRADUATE SCHOOL

E ÇAKMAK - 2023 - polen.itu.edu.tr

… presented study is to show that reinforcement learning based … The most preferred model-free
reinforcement algorithms in … the main aspects of DDPG is that it is an off-policy algorithm. …

[PDF] researchgate.net

[PDF][PDF] 深度强化学习综述: 兼论计算机围棋的发展

赵冬斌，邵坤，朱圆恒，李栋，陈亚冉，王海涛… - 控制理论与 …, 2016 - researchgate.net

… reinforcement learning, reviews the history of computer Go concurrently, analyzes the
algorithms features, and discusses the research … 蒙特卡罗方法同时还可以与离策略(off-policy)的…

被引用次数：38 相关文章所有 4 个版本

基于深度强化学习的双足机器人斜坡步态控制方法

吴晓光，刘绍维，杨磊，邓文强，贾哲恒 - 自动化学报, 2021 - aas.net.cn

… passive biped robot based on deep reinforcement learning. By analyzing the hybrid dynamics
… learning of quasi-passive dynamic walking by an unstable biped robot based on off-policy …

被引用次数：8 相关文章所有 3 个版本

[PDF] zjujournals.com

基于带积分补偿近端策略优化算法的四旋翼控制

胡欢，王庆领 - 信息与电子工程前沿(英文), 2022 - fitee.zjujournals.com

… proximal policy optimization (PPO) reinforcement learning … is the result of off-policy learning,
TD-learning, and nonlinear function … When the empirical data in the buffer reaches the set …

高级搜索

QQ 群

[PDF][PDF] ISTANBUL TECHNICAL UNIVERSITY 击GRADUATE SCHOOL

基于表征学习的离线强化学习方法研究综述

深度强化学习综述

自监督脓毒症治疗推荐算法

多重代理人之策略競爭遊戲之強化學習方法

基于时间差分误差的离线强化学习采样策略

[PDF][PDF] ISTANBUL TECHNICAL UNIVERSITY 击GRADUATE SCHOOL

[PDF][PDF] 深度强化学习综述: 兼论计算机围棋的发展

基于深度强化学习的双足机器人斜坡步态控制方法

基于带积分补偿近端策略优化算法的四旋翼控制

引用