Empirical study of off-policy policy evaluation for reinforcement learning

N Felicioni, M Benigni, MF Dacrema - arXiv preprint arXiv:2406.18022, 2024 - arxiv.org

The Off-Policy Evaluation (OPE) problem consists of evaluating the performance of
counterfactual policies with data collected by another one. This problem is of utmost …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

-Nearest-Neighbor Resampling for Off-Policy Evaluation in Stochastic Control

M Giegrich, R Oomen, C Reisinger - arXiv preprint arXiv:2306.04836, 2023 - arxiv.org

In this paper, we propose a novel $ K $-nearest neighbor resampling procedure for
estimating the performance of a policy from historical data containing realized episodes of a …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Learning to control autonomous fleets from observation via offline reinforcement learning

C Schmidt, D Gammelli, FC Pereira… - 2024 European …, 2024 - ieeexplore.ieee.org

Autonomous Mobility-on-Demand (AMoD) systems are an evolving mode of transportation in
which a centrally coordinated fleet of self-driving vehicles dynamically serves travel …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

Supervised off-policy ranking

Y Jin, Y Zhang, T Qin, X Zhang, J Yuan, H Li… - arXiv preprint arXiv …, 2021 - arxiv.org

Off-policy evaluation (OPE) is to evaluate a target policy with data generated by other
policies. Most previous OPE methods focus on precisely estimating the true performance of …

被引用次数：9 相关文章所有 3 个版本

[PDF] neurips.cc

Sope: Spectrum of off-policy estimators

C Yuan, Y Chandak, S Giguere… - Advances in …, 2021 - proceedings.neurips.cc

Many sequential decision making problems are high-stakes and require off-policy
evaluation (OPE) of a new policy using historical data collected using some other policy …

被引用次数：6 相关文章所有 8 个版本

[PDF] arxiv.org

Explaining practical differences between treatment effect estimators with high dimensional asymptotics

S Yadlowsky - arXiv preprint arXiv:2203.12538, 2022 - arxiv.org

We revisit the classical causal inference problem of estimating the average treatment effect
in the presence of fully observed confounding variables using two-stage semiparametric …

被引用次数：6 相关文章所有 2 个版本

Engagement rewarded actor-critic with conservative Q-learning for speech-driven laughter backchannel generation

ÖZ Bayramoğlu, E Erzin, TM Sezgin… - Proceedings of the 2021 …, 2021 - dl.acm.org

We propose a speech-driven laughter backchannel generation model to reward
engagement during human-agent interaction. We formulate the problem as a Markov …

被引用次数：7 相关文章

[PDF] mlr.press

Asymptotically Unbiased Off-Policy Policy Evaluation when Reusing Old Data in Nonstationary Environments

V Liu, Y Chandak, P Thomas… - … Conference on Artificial …, 2023 - proceedings.mlr.press

In this work, we consider the off-policy policy evaluation problem for contextual bandits and
finite horizon reinforcement learning in the nonstationary setting. Reusing old data is critical …

被引用次数：1 相关文章所有 3 个版本

[HTML] sciencedirect.com

[HTML][HTML] Guideline-informed reinforcement learning for mechanical ventilation in critical care

F den Hengst, M Otten, P Elbers… - Artificial Intelligence in …, 2024 - Elsevier

Reinforcement Learning (RL) has recently found many applications in the healthcare
domain thanks to its natural fit to clinical decision-making and ability to learn optimal …

Long-term Off-Policy Evaluation and Learning

Y Saito, H Abdollahpouri, J Anderton… - Proceedings of the …, 2024 - dl.acm.org

Short-and long-term outcomes of an algorithm often differ, with damaging downstream
effects. A known example is a click-bait algorithm, which may increase short-term clicks but …

被引用次数：1 相关文章所有 4 个版本

高级搜索

QQ 群