Improving and benchmarking offline reinforcement learning algorithms

B Kang, X Ma, Y Wang, Y Yue, S Yan - arXiv preprint arXiv:2306.00972, 2023 - arxiv.org
Recently, Offline Reinforcement Learning (RL) has achieved remarkable progress with the
emergence of various algorithms and datasets. However, these methods usually focus on …

Semi-supervised off-policy reinforcement learning and value estimation for dynamic treatment regimes

A Sonabend-W, N Laha, AN Ananthakrishnan… - Journal of Machine …, 2023 - jmlr.org
Reinforcement learning (RL) has shown great promise in estimating dynamic treatment
regimes which take into account patient heterogeneity. However, health-outcome …

Minimax model learning

C Voloshin, N Jiang, Y Yue - International Conference on …, 2021 - proceedings.mlr.press
We present a novel off-policy loss function for learning a transition model in model-based
reinforcement learning. Notably, our loss is derived from the off-policy policy evaluation …

When is off-policy evaluation useful? a data-centric perspective

H Sun, AJ Chan, N Seedat, A Hüyük… - arXiv preprint arXiv …, 2023 - arxiv.org
Evaluating the value of a hypothetical target policy with only a logged dataset is important
but challenging. On the one hand, it brings opportunities for safe policy improvement under …

SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation

H Kiyohara, R Kishimoto, K Kawakami… - arXiv preprint arXiv …, 2023 - arxiv.org
This paper introduces SCOPE-RL, a comprehensive open-source Python software designed
for offline reinforcement learning (offline RL), off-policy evaluation (OPE), and selection …

Optimizing Representations and Policies for Question Sequencing Using Reinforcement Learning.

AZ Azhar, A Segal, K Gal - International Educational Data Mining Society, 2022 - ERIC
This paper studies the use of Reinforcement Learning (RL) policies for optimizing the
sequencing of online learning materials to students. Our approach provides an end to end …

Off-policy evaluation with online adaptation for robot exploration in challenging environments

Y Hu, J Geng, C Wang, J Keller… - IEEE Robotics and …, 2023 - ieeexplore.ieee.org
Autonomous exploration has many important applications. However, classic information
gain-based or frontier-based exploration only relies on the robot current state to determine …

Accelerating offline reinforcement learning application in real-time bidding and recommendation: Potential use of simulation

H Kiyohara, K Kawakami, Y Saito - arXiv preprint arXiv:2109.08331, 2021 - arxiv.org
In recommender systems (RecSys) and real-time bidding (RTB) for online advertisements,
we often try to optimize sequential decision making using bandit and reinforcement learning …

Deep reinforcement learning approaches for technology enhanced learning

Z Li - 2023 - etheses.dur.ac.uk
Artificial Intelligence (AI) has advanced significantly in recent years, transforming various
industries and domains. Its ability to extract patterns and insights from large volumes of data …

Probabilistic Offline Policy Ranking with Approximate Bayesian Computation

L Da, P Jenkins, T Schwantes, J Dotson… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
In practice, it is essential to compare and rank candidate policies offline before real-world
deployment for safety and reliability. Prior work seeks to solve this offline policy ranking …