Offline (or batch) reinforcement learning (RL) algorithms seek to learn an optimal policy from a fixed dataset without active data collection. Based on the composition of the offline dataset …
Offline reinforcement learning (RL) aims to find performant policies from logged data without further environment interaction. Model-based algorithms, which learn a model of the …
S Athey, S Wager - Econometrica, 2021 - Wiley Online Library
In many areas, practitioners seek to use observational data to learn a treatment assignment policy that satisfies application‐specific constraints, such as budget, fairness, simplicity, or …
We offer an experimental benchmark and empirical study for off-policy policy evaluation (OPE) in reinforcement learning, which is a key problem in many safety critical applications …
We consider the offline reinforcement learning problem, where the aim is to learn a decision making policy from logged data. Offline RL--particularly when coupled with (value) function …
Z Li, J Chen, E Laber, F Liu… - International Statistical …, 2023 - Wiley Online Library
A treatment regime is a sequence of decision rules, one per decision point, that maps accumulated patient information to a recommended intervention. An optimal treatment …
When observed decisions depend only on observed features, off-policy policy evaluation (OPE) methods for sequential decision problems can estimate the performance of evaluation …
Sample-efficient offline reinforcement learning (RL) with linear function approximation has been studied extensively recently. Much of the prior work has yielded instance-independent …
This paper addresses the problem of policy selection in domains with abundant logged data, but with a restricted interaction budget. Solving this problem would enable safe evaluation …