On the relation between policy improvement and off-policy minimum-variance policy evaluation

AM Metelli, S Meta, M Restelli - Uncertainty in Artificial …, 2023 - proceedings.mlr.press
Off-policy methods are the basis of a large number of effective Policy Optimization (PO)
algorithms. In this setting, Importance Sampling (IS) is typically employed for off-policy …

A new challenge in policy evaluation

S Zhang - Proceedings of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org
A New Challenge in Policy Evaluation Page 1 A New Challenge in Policy Evaluation Shangtong
Zhang University of Virginia 85 Engineer’s Way, Charlottesville, VA, 22903, USA …

Framework and methods of diverse exploration for fast and safe policy improvement

L Yu, A Cohen - US Patent 11,568,236, 2023 - Google Patents
2019-03-19 Assigned to RESEARCH FOUNDATION FOR THE STATE UNIVERSITY OF
NEW YORK reassignment RESEARCH FOUNDATION FOR THE STATE UNIVERSITY OF …

Efficient Open-world Reinforcement Learning via Knowledge Distillation and Autonomous Rule Discovery

E Nikonova, C Xue, J Renz - arXiv preprint arXiv:2311.14270, 2023 - arxiv.org
Deep reinforcement learning suffers from catastrophic forgetting and sample inefficiency
making it less applicable to the ever-changing real world. However, the ability to use …

Improving Monte Carlo Evaluation with Offline Data

S Liu, S Zhang - arXiv preprint arXiv:2301.13734, 2023 - arxiv.org
Monte Carlo (MC) methods are the most widely used methods to estimate the performance
of a policy. Given an interested policy, MC methods give estimates by repeatedly running …

Estimation and control of visitation distributions for reinforcement learning

I Durugkar - 2023 - repositories.lib.utexas.edu
In sequential decision making tasks an agent needs to make decisions and interact with the
world in order to maximize its long-term expected utility. These tasks are complex since the …

Learning and planning with noise in optimization and reinforcement learning

V Thomas - 2023 - papyrus.bib.umontreal.ca
Most modern machine learning algorithms incorporate a degree of randomness in their
processes, which we will refer to as noise, which can ultimately impact the model's …