Data-efficient policy evaluation through behavior policy search

AM Metelli, S Meta, M Restelli - Uncertainty in Artificial …, 2023 - proceedings.mlr.press

Off-policy methods are the basis of a large number of effective Policy Optimization (PO)
algorithms. In this setting, Importance Sampling (IS) is typically employed for off-policy …

被引用次数：2 相关文章所有 9 个版本

[PDF] aaai.org

A new challenge in policy evaluation

S Zhang - Proceedings of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org

A New Challenge in Policy Evaluation Page 1 A New Challenge in Policy Evaluation Shangtong
Zhang University of Virginia 85 Engineer’s Way, Charlottesville, VA, 22903, USA …

被引用次数：1 相关文章所有 2 个版本

Framework and methods of diverse exploration for fast and safe policy improvement

L Yu, A Cohen - US Patent 11,568,236, 2023 - Google Patents

2019-03-19 Assigned to RESEARCH FOUNDATION FOR THE STATE UNIVERSITY OF
NEW YORK reassignment RESEARCH FOUNDATION FOR THE STATE UNIVERSITY OF …

被引用次数：25 相关文章所有 4 个版本

[PDF] arxiv.org

被引用次数：1 相关文章

[PDF] umontreal.ca

Learning and planning with noise in optimization and reinforcement learning

V Thomas - 2023 - papyrus.bib.umontreal.ca

Most modern machine learning algorithms incorporate a degree of randomness in their
processes, which we will refer to as noise, which can ultimately impact the model's …

高级搜索

QQ 群