Empirical study of off-policy policy evaluation for reinforcement learning- 学术资源搜索

过去一年中添加的文章，按日期排序

[PDF][PDF] Reinforcement Learning for Continuing Problems Using Average Reward

A Naik - 2024 - era.library.ualberta.ca

42 天前 - … on-policy setting with linear function approximation. We also show the first convergence
proof in the off-policy … In this section I empirically test Differential TD in both the on- and …

[PDF] arxiv.org

Off-Policy Evaluation from Logged Human Feedback

A Bhargava, L Jain, B Kveton, G Liu… - arXiv preprint arXiv …, 2024 - arxiv.org

46 天前 - … for policy values, and show how to optimize them. We analyze unbiasedness of our
estimators and evaluate them empirically… step in training them is reinforcement learning with …

[PDF] ieee.org

Off-Policy Prediction Learning: An Empirical Study of Online Algorithms

S Ghiassian, B Rafiee, RS Sutton - … Networks and Learning …, 2024 - ieeexplore.ieee.org

49 天前 - … most challenging problems in reinforcement learning. This … ically studies 11 off-policy
prediction learning algorithms … evaluate the possibility of conducting a comparative study …

相关文章所有 2 个版本

[PDF] arxiv.org

A Theory of Learnability for Offline Decision Making

C Mao, Q Zhang - arXiv preprint arXiv:2406.01378, 2024 - arxiv.org

56 天前 - … research has extensively studied specific offline decision making problems like
offline reinforcement learning (RL) and off-policy evaluation (… named Empirical Decision with …

相关文章所有 2 个版本

[PDF] mlr.press

Policy Evaluation for Reinforcement Learning from Human Feedback: A Sample Complexity Analysis

Z Li, X Ji, M Chen, M Wang - International Conference on …, 2024 - proceedings.mlr.press

103 天前 - … Empirically, many studies have shown that learning human preferences requires
only a … This paper studies nonparametric off-policy evaluation in reinforcement learning with …

Integrating human learning and reinforcement learning: A novel approach to agent training

YH Li, F Zhang, Q Hua, XH Zhou - Knowledge-Based Systems, 2024 - Elsevier

112 天前 - … Off-policy reinforcement learning (RL) algorithms are known … The empirical
results in Section 5 demonstrated that … For computational simplicity, we test the SAC with two …

Distributed entropy-regularized multi-agent reinforcement learning with policy consensus

Y Hu, J Fu, G Wen, Y Lv, W Ren - Automatica, 2024 - Elsevier

117 天前 - … off-policy version of the proposed algorithm is developed which possesses scalability,
data efficiency and learning … , whose learning performance is empirically demonstrated …

被引用次数：1 相关文章

[PDF] sonar.ch

Reinforcement learning with general evaluators and generators of policies

F Faccio - 2024 - sonar.ch

118 天前 - … Lastly, we empirically demonstrate how this approach can be … In off-policy policy
optimization, we seek to find the … mostly local off-policy evaluation around the learned policy, …

相关文章所有 2 个版本

[PDF] arxiv.org

When Do Off-Policy and On-Policy Policy Gradient Methods Align?

D Mambelli, S Bongers, O Zoeter, MTJ Spaan… - arXiv preprint arXiv …, 2024 - arxiv.org

161 天前 - … In the end, we empirically verify the impact of the on-off … to evaluate policies in
off-policy reinforcement learning and … optimization goal of the reinforcement learning paradigm. …

相关文章所有 2 个版本

[PDF] arxiv.org

Conservative Exploration for Policy Optimization via Off-Policy Policy Evaluation

P Daoudi, M Formoso, O Gaizi, A Azize… - arXiv preprint arXiv …, 2023 - arxiv.org

219 天前 - … off-policy policy evaluation techniques. We show empirically the effectiveness of our
methods. … Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a …

相关文章所有 2 个版本

高级搜索

QQ 群

[PDF][PDF] Reinforcement Learning for Continuing Problems Using Average Reward

Off-Policy Evaluation from Logged Human Feedback

Off-Policy Prediction Learning: An Empirical Study of Online Algorithms

A Theory of Learnability for Offline Decision Making

Policy Evaluation for Reinforcement Learning from Human Feedback: A Sample Complexity Analysis

Integrating human learning and reinforcement learning: A novel approach to agent training

Distributed entropy-regularized multi-agent reinforcement learning with policy consensus

Reinforcement learning with general evaluators and generators of policies

When Do Off-Policy and On-Policy Policy Gradient Methods Align?

Conservative Exploration for Policy Optimization via Off-Policy Policy Evaluation

引用