Anytime-valid off-policy inference for contextual bandits

I Waudby-Smith, L Wu, A Ramdas… - ACM/JMS Journal of …, 2024 - dl.acm.org
Contextual bandit algorithms are ubiquitous tools for active sequential experimentation in
healthcare and the tech industry. They involve online learning algorithms that adaptively …

Probabilistic design of optimal sequential decision-making algorithms in learning and control

É Garrabé, G Russo - Annual Reviews in Control, 2022 - Elsevier
This survey is focused on certain sequential decision-making problems that involve
optimizing over probability functions. We discuss the relevance of these problems for …

Universal off-policy evaluation

Y Chandak, S Niekum, B da Silva… - Advances in …, 2021 - proceedings.neurips.cc
When faced with sequential decision-making problems, it is often useful to be able to predict
what would happen if decisions were made using a new policy. Those predictions must …

Distributional offline policy evaluation with predictive error guarantees

R Wu, M Uehara, W Sun - International Conference on …, 2023 - proceedings.mlr.press
We study the problem of estimating the distribution of the return of a policy using an offline
dataset that is not generated from the policy, ie, distributional offline policy evaluation (OPE) …

Conformal off-policy prediction in contextual bandits

MF Taufiq, JF Ton, R Cornish… - Advances in Neural …, 2022 - proceedings.neurips.cc
Most off-policy evaluation methods for contextual bandits have focused on the expected
outcome of a policy, which is estimated via methods that at best provide only asymptotic …

Supervised learning with general risk functionals

L Leqi, A Huang, Z Lipton… - … on Machine Learning, 2022 - proceedings.mlr.press
Standard uniform convergence results bound the generalization gap of the expected loss
over a hypothesis class. The emergence of risk-sensitive learning requires generalization …

CONSEQUENCES—Causality, Counterfactuals and Sequential Decision-Making for Recommender Systems

O Jeunen, T Joachims, H Oosterhuis, Y Saito… - Proceedings of the 16th …, 2022 - dl.acm.org
Recommender systems are more and more often modelled as repeated decision making
processes–deciding which (ranking of) items to recommend to a given user. Each decision …

Risk verification of stochastic systems with neural network controllers

M Cleaveland, L Lindemann, R Ivanov, GJ Pappas - Artificial Intelligence, 2022 - Elsevier
Motivated by the fragility of neural network (NN) controllers in safety-critical applications, we
present a data-driven framework for verifying the risk of stochastic dynamical systems with …

Tipping point forecasting in non-stationary dynamics on function spaces

M Liu-Schiaffini, CE Singer, N Kovachki… - arXiv preprint arXiv …, 2023 - arxiv.org
Tipping points are abrupt, drastic, and often irreversible changes in the evolution of non-
stationary and chaotic dynamical systems. For instance, increased greenhouse gas …

Regret bounds for risk-sensitive reinforcement learning with lipschitz dynamic risk measures

H Liang, Z Luo - International Conference on Artificial …, 2024 - proceedings.mlr.press
We study finite episodic Markov decision processes incorporating dynamic risk measures to
capture risk sensitivity. To this end, we present two model-based algorithms applied to\emph …