AutoOPE: Automated Off-Policy Estimator Selection

N Felicioni, M Benigni, MF Dacrema - arXiv preprint arXiv:2406.18022, 2024 - arxiv.org
The Off-Policy Evaluation (OPE) problem consists of evaluating the performance of
counterfactual policies with data collected by another one. This problem is of utmost …

-Nearest-Neighbor Resampling for Off-Policy Evaluation in Stochastic Control

M Giegrich, R Oomen, C Reisinger - arXiv preprint arXiv:2306.04836, 2023 - arxiv.org
In this paper, we propose a novel $ K $-nearest neighbor resampling procedure for
estimating the performance of a policy from historical data containing realized episodes of a …

Learning to control autonomous fleets from observation via offline reinforcement learning

C Schmidt, D Gammelli, FC Pereira… - 2024 European …, 2024 - ieeexplore.ieee.org
Autonomous Mobility-on-Demand (AMoD) systems are an evolving mode of transportation in
which a centrally coordinated fleet of self-driving vehicles dynamically serves travel …

Supervised off-policy ranking

Y Jin, Y Zhang, T Qin, X Zhang, J Yuan, H Li… - arXiv preprint arXiv …, 2021 - arxiv.org
Off-policy evaluation (OPE) is to evaluate a target policy with data generated by other
policies. Most previous OPE methods focus on precisely estimating the true performance of …

Sope: Spectrum of off-policy estimators

C Yuan, Y Chandak, S Giguere… - Advances in …, 2021 - proceedings.neurips.cc
Many sequential decision making problems are high-stakes and require off-policy
evaluation (OPE) of a new policy using historical data collected using some other policy …

Explaining practical differences between treatment effect estimators with high dimensional asymptotics

S Yadlowsky - arXiv preprint arXiv:2203.12538, 2022 - arxiv.org
We revisit the classical causal inference problem of estimating the average treatment effect
in the presence of fully observed confounding variables using two-stage semiparametric …

Engagement rewarded actor-critic with conservative Q-learning for speech-driven laughter backchannel generation

ÖZ Bayramoğlu, E Erzin, TM Sezgin… - Proceedings of the 2021 …, 2021 - dl.acm.org
We propose a speech-driven laughter backchannel generation model to reward
engagement during human-agent interaction. We formulate the problem as a Markov …

Asymptotically Unbiased Off-Policy Policy Evaluation when Reusing Old Data in Nonstationary Environments

V Liu, Y Chandak, P Thomas… - … Conference on Artificial …, 2023 - proceedings.mlr.press
In this work, we consider the off-policy policy evaluation problem for contextual bandits and
finite horizon reinforcement learning in the nonstationary setting. Reusing old data is critical …

[HTML][HTML] Guideline-informed reinforcement learning for mechanical ventilation in critical care

F den Hengst, M Otten, P Elbers… - Artificial Intelligence in …, 2024 - Elsevier
Reinforcement Learning (RL) has recently found many applications in the healthcare
domain thanks to its natural fit to clinical decision-making and ability to learn optimal …

Long-term Off-Policy Evaluation and Learning

Y Saito, H Abdollahpouri, J Anderton… - Proceedings of the …, 2024 - dl.acm.org
Short-and long-term outcomes of an algorithm often differ, with damaging downstream
effects. A known example is a click-bait algorithm, which may increase short-term clicks but …