过去一年中添加的文章,按日期排序

[PDF][PDF] Reinforcement Learning for Continuing Problems Using Average Reward

A Naik - 2024 - era.library.ualberta.ca
42 天前 - … on-policy setting with linear function approximation. We also show the first convergence
proof in the off-policy … In this section I empirically test Differential TD in both the on- and …

Off-Policy Evaluation from Logged Human Feedback

A Bhargava, L Jain, B Kveton, G Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
46 天前 - … for policy values, and show how to optimize them. We analyze unbiasedness of our
estimators and evaluate them empirically… step in training them is reinforcement learning with …

Off-Policy Prediction Learning: An Empirical Study of Online Algorithms

S Ghiassian, B Rafiee, RS Sutton - … Networks and Learning …, 2024 - ieeexplore.ieee.org
49 天前 - … most challenging problems in reinforcement learning. This … ically studies 11 off-policy
prediction learning algorithms … evaluate the possibility of conducting a comparative study

A Theory of Learnability for Offline Decision Making

C Mao, Q Zhang - arXiv preprint arXiv:2406.01378, 2024 - arxiv.org
56 天前 - research has extensively studied specific offline decision making problems like
offline reinforcement learning (RL) and off-policy evaluation (… named Empirical Decision with …

Policy Evaluation for Reinforcement Learning from Human Feedback: A Sample Complexity Analysis

Z Li, X Ji, M Chen, M Wang - International Conference on …, 2024 - proceedings.mlr.press
103 天前 - Empirically, many studies have shown that learning human preferences requires
only a … This paper studies nonparametric off-policy evaluation in reinforcement learning with …

Integrating human learning and reinforcement learning: A novel approach to agent training

YH Li, F Zhang, Q Hua, XH Zhou - Knowledge-Based Systems, 2024 - Elsevier
112 天前 - Off-policy reinforcement learning (RL) algorithms are known … The empirical
results in Section 5 demonstrated that … For computational simplicity, we test the SAC with two …

Distributed entropy-regularized multi-agent reinforcement learning with policy consensus

Y Hu, J Fu, G Wen, Y Lv, W Ren - Automatica, 2024 - Elsevier
117 天前 - off-policy version of the proposed algorithm is developed which possesses scalability,
data efficiency and learning … , whose learning performance is empirically demonstrated …

Reinforcement learning with general evaluators and generators of policies

F Faccio - 2024 - sonar.ch
118 天前 - … Lastly, we empirically demonstrate how this approach can be … In off-policy policy
optimization, we seek to find the … mostly local off-policy evaluation around the learned policy, …

When Do Off-Policy and On-Policy Policy Gradient Methods Align?

D Mambelli, S Bongers, O Zoeter, MTJ Spaan… - arXiv preprint arXiv …, 2024 - arxiv.org
161 天前 - … In the end, we empirically verify the impact of the on-off … to evaluate policies in
off-policy reinforcement learning and … optimization goal of the reinforcement learning paradigm. …

Conservative Exploration for Policy Optimization via Off-Policy Policy Evaluation

P Daoudi, M Formoso, O Gaizi, A Azize… - arXiv preprint arXiv …, 2023 - arxiv.org
219 天前 - off-policy policy evaluation techniques. We show empirically the effectiveness of our
methods. … Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a …