Universal off-policy evaluation

K Khetarpal, M Riemer, I Rish, D Precup - Journal of Artificial Intelligence …, 2022 - jair.org

In this article, we aim to provide a literature review of different formulations and approaches
to continual reinforcement learning (RL), also known as lifelong or non-stationary RL. We …

被引用次数：256 相关文章所有 9 个版本

[PDF] acm.org

Anytime-valid off-policy inference for contextual bandits

I Waudby-Smith, L Wu, A Ramdas… - ACM/JMS Journal of …, 2024 - dl.acm.org

Contextual bandit algorithms are ubiquitous tools for active sequential experimentation in
healthcare and the tech industry. They involve online learning algorithms that adaptively …

被引用次数：29 相关文章所有 5 个版本

[PDF] enseeiht.fr

[图书][B] Distributional reinforcement learning

MG Bellemare, W Dabney, M Rowland - 2023 - books.google.com

The first comprehensive guide to distributional reinforcement learning, providing a new
mathematical formalism for thinking about decisions from a probabilistic perspective …

被引用次数：118 相关文章所有 9 个版本

[PDF] arxiv.org

Proximal reinforcement learning: Efficient off-policy evaluation in partially observed markov decision processes

A Bennett, N Kallus - Operations Research, 2024 - pubsonline.informs.org

In applications of offline reinforcement learning to observational data, such as in healthcare
or education, a general concern is that observed actions might be affected by unobserved …

被引用次数：35 相关文章所有 7 个版本

[PDF] mlr.press

Distributional offline policy evaluation with predictive error guarantees

R Wu, M Uehara, W Sun - International Conference on …, 2023 - proceedings.mlr.press

We study the problem of estimating the distribution of the return of a policy using an offline
dataset that is not generated from the policy, ie, distributional offline policy evaluation (OPE) …

被引用次数：11 相关文章所有 6 个版本

[PDF] neurips.cc

Conformal off-policy prediction in contextual bandits

MF Taufiq, JF Ton, R Cornish… - Advances in Neural …, 2022 - proceedings.neurips.cc

Most off-policy evaluation methods for contextual bandits have focused on the expected
outcome of a policy, which is estimated via methods that at best provide only asymptotic …

被引用次数：14 相关文章所有 5 个版本

[PDF] neurips.cc

Allsim: Simulating and benchmarking resource allocation policies in multi-user systems

J Berrevoets, D Jarrett, A Chan… - Advances in Neural …, 2024 - proceedings.neurips.cc

Numerous real-world systems, ranging from healthcare to energy grids, involve users
competing for finite and potentially scarce resources. Designing policies for resource …

被引用次数：1 相关文章所有 5 个版本

[PDF] arxiv.org

Hope: Human-centric off-policy evaluation for e-learning and healthcare

G Gao, S Ju, MS Ausin, M Chi - arXiv preprint arXiv:2302.09212, 2023 - arxiv.org

Reinforcement learning (RL) has been extensively researched for enhancing human-
environment interactions in various human-centric tasks, including e-learning and …

被引用次数：10 相关文章所有 6 个版本

[PDF] arxiv.org

Opportunities and challenges from using animal videos in reinforcement learning for navigation

V Giammarino, J Queeney, LC Carstensen… - IFAC-PapersOnLine, 2023 - Elsevier

We investigate the use of animal videos (observations) to improve Reinforcement Learning
(RL) efficiency and performance in navigation tasks with sparse rewards. Motivated by …

被引用次数：3 相关文章所有 5 个版本

[PDF] neurips.cc

Off-policy evaluation for action-dependent non-stationary environments

Y Chandak, S Shankar, N Bastian… - Advances in …, 2022 - proceedings.neurips.cc

Methods for sequential decision-making are often built upon a foundational assumption that
the underlying decision process is stationary. This limits the application of such methods …

被引用次数：5 相关文章所有 7 个版本

高级搜索

QQ 群

Towards continual reinforcement learning: A review and perspectives

Anytime-valid off-policy inference for contextual bandits

[图书][B] Distributional reinforcement learning

Proximal reinforcement learning: Efficient off-policy evaluation in partially observed markov decision processes

Distributional offline policy evaluation with predictive error guarantees

Conformal off-policy prediction in contextual bandits

Allsim: Simulating and benchmarking resource allocation policies in multi-user systems

Hope: Human-centric off-policy evaluation for e-learning and healthcare

Opportunities and challenges from using animal videos in reinforcement learning for navigation

Off-policy evaluation for action-dependent non-stationary environments

引用