Empirical study of off-policy policy evaluation for reinforcement learning

B Kang, X Ma, Y Wang, Y Yue, S Yan - arXiv preprint arXiv:2306.00972, 2023 - arxiv.org

Recently, Offline Reinforcement Learning (RL) has achieved remarkable progress with the
emergence of various algorithms and datasets. However, these methods usually focus on …

被引用次数：3 相关文章所有 2 个版本

[PDF] jmlr.org

Semi-supervised off-policy reinforcement learning and value estimation for dynamic treatment regimes

A Sonabend-W, N Laha, AN Ananthakrishnan… - Journal of Machine …, 2023 - jmlr.org

Reinforcement learning (RL) has shown great promise in estimating dynamic treatment
regimes which take into account patient heterogeneity. However, health-outcome …

被引用次数：2 相关文章

[PDF] mlr.press

Minimax model learning

C Voloshin, N Jiang, Y Yue - International Conference on …, 2021 - proceedings.mlr.press

We present a novel off-policy loss function for learning a transition model in model-based
reinforcement learning. Notably, our loss is derived from the off-policy policy evaluation …

被引用次数：14 相关文章所有 9 个版本

[PDF] arxiv.org

When is off-policy evaluation useful? a data-centric perspective

H Sun, AJ Chan, N Seedat, A Hüyük… - arXiv preprint arXiv …, 2023 - arxiv.org

Evaluating the value of a hypothetical target policy with only a logged dataset is important
but challenging. On the one hand, it brings opportunities for safe policy improvement under …

被引用次数：2 相关文章所有 3 个版本

SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation

H Kiyohara, R Kishimoto, K Kawakami… - arXiv preprint arXiv …, 2023 - arxiv.org

This paper introduces SCOPE-RL, a comprehensive open-source Python software designed
for offline reinforcement learning (offline RL), off-policy evaluation (OPE), and selection …

被引用次数：2 相关文章所有 2 个版本

[PDF] ed.gov

Optimizing Representations and Policies for Question Sequencing Using Reinforcement Learning.

AZ Azhar, A Segal, K Gal - International Educational Data Mining Society, 2022 - ERIC

This paper studies the use of Reinforcement Learning (RL) policies for optimizing the
sequencing of online learning materials to students. Our approach provides an end to end …

被引用次数：5 相关文章所有 4 个版本

[PDF] arxiv.org

Off-policy evaluation with online adaptation for robot exploration in challenging environments

Y Hu, J Geng, C Wang, J Keller… - IEEE Robotics and …, 2023 - ieeexplore.ieee.org

Autonomous exploration has many important applications. However, classic information
gain-based or frontier-based exploration only relies on the robot current state to determine …

被引用次数：10 相关文章所有 5 个版本

[PDF] arxiv.org

Accelerating offline reinforcement learning application in real-time bidding and recommendation: Potential use of simulation

H Kiyohara, K Kawakami, Y Saito - arXiv preprint arXiv:2109.08331, 2021 - arxiv.org

In recommender systems (RecSys) and real-time bidding (RTB) for online advertisements,
we often try to optimize sequential decision making using bandit and reinforcement learning …

被引用次数：10 相关文章所有 2 个版本

[PDF] dur.ac.uk

Deep reinforcement learning approaches for technology enhanced learning

Z Li - 2023 - etheses.dur.ac.uk

Artificial Intelligence (AI) has advanced significantly in recent years, transforming various
industries and domains. Its ability to extract patterns and insights from large volumes of data …

被引用次数：6 相关文章所有 5 个版本

[PDF] aaai.org

Probabilistic Offline Policy Ranking with Approximate Bayesian Computation

L Da, P Jenkins, T Schwantes, J Dotson… - Proceedings of the AAAI …, 2024 - ojs.aaai.org

In practice, it is essential to compare and rank candidate policies offline before real-world
deployment for safety and reliability. Prior work seeks to solve this offline policy ranking …

高级搜索

QQ 群