Contextual bandit algorithms are ubiquitous tools for active sequential experimentation in healthcare and the tech industry. They involve online learning algorithms that adaptively …
The first comprehensive guide to distributional reinforcement learning, providing a new mathematical formalism for thinking about decisions from a probabilistic perspective …
A Bennett, N Kallus - Operations Research, 2024 - pubsonline.informs.org
In applications of offline reinforcement learning to observational data, such as in healthcare or education, a general concern is that observed actions might be affected by unobserved …
R Wu, M Uehara, W Sun - International Conference on …, 2023 - proceedings.mlr.press
We study the problem of estimating the distribution of the return of a policy using an offline dataset that is not generated from the policy, ie, distributional offline policy evaluation (OPE) …
Most off-policy evaluation methods for contextual bandits have focused on the expected outcome of a policy, which is estimated via methods that at best provide only asymptotic …
Numerous real-world systems, ranging from healthcare to energy grids, involve users competing for finite and potentially scarce resources. Designing policies for resource …
Reinforcement learning (RL) has been extensively researched for enhancing human- environment interactions in various human-centric tasks, including e-learning and …
We investigate the use of animal videos (observations) to improve Reinforcement Learning (RL) efficiency and performance in navigation tasks with sparse rewards. Motivated by …
Y Chandak, S Shankar, N Bastian… - Advances in …, 2022 - proceedings.neurips.cc
Methods for sequential decision-making are often built upon a foundational assumption that the underlying decision process is stationary. This limits the application of such methods …