Off-policy risk assessment in contextual bandits

I Waudby-Smith, L Wu, A Ramdas… - ACM/JMS Journal of …, 2024 - dl.acm.org

Contextual bandit algorithms are ubiquitous tools for active sequential experimentation in
healthcare and the tech industry. They involve online learning algorithms that adaptively …

被引用次数：30 相关文章所有 5 个版本

[PDF] arxiv.org

Probabilistic design of optimal sequential decision-making algorithms in learning and control

É Garrabé, G Russo - Annual Reviews in Control, 2022 - Elsevier

This survey is focused on certain sequential decision-making problems that involve
optimizing over probability functions. We discuss the relevance of these problems for …

被引用次数：11 相关文章所有 4 个版本

[PDF] neurips.cc

Universal off-policy evaluation

Y Chandak, S Niekum, B da Silva… - Advances in …, 2021 - proceedings.neurips.cc

When faced with sequential decision-making problems, it is often useful to be able to predict
what would happen if decisions were made using a new policy. Those predictions must …

被引用次数：56 相关文章所有 11 个版本

[PDF] mlr.press

Distributional offline policy evaluation with predictive error guarantees

R Wu, M Uehara, W Sun - International Conference on …, 2023 - proceedings.mlr.press

We study the problem of estimating the distribution of the return of a policy using an offline
dataset that is not generated from the policy, ie, distributional offline policy evaluation (OPE) …

被引用次数：18 相关文章所有 6 个版本

[PDF] neurips.cc

Conformal off-policy prediction in contextual bandits

MF Taufiq, JF Ton, R Cornish… - Advances in Neural …, 2022 - proceedings.neurips.cc

Most off-policy evaluation methods for contextual bandits have focused on the expected
outcome of a policy, which is estimated via methods that at best provide only asymptotic …

被引用次数：19 相关文章所有 5 个版本

[PDF] mlr.press

Supervised learning with general risk functionals

L Leqi, A Huang, Z Lipton… - … on Machine Learning, 2022 - proceedings.mlr.press

Standard uniform convergence results bound the generalization gap of the expected loss
over a hypothesis class. The emergence of risk-sensitive learning requires generalization …

被引用次数：10 相关文章所有 4 个版本

[PDF] acm.org

CONSEQUENCES—Causality, Counterfactuals and Sequential Decision-Making for Recommender Systems

O Jeunen, T Joachims, H Oosterhuis, Y Saito… - Proceedings of the 16th …, 2022 - dl.acm.org

Recommender systems are more and more often modelled as repeated decision making
processes–deciding which (ranking of) items to recommend to a given user. Each decision …

被引用次数：9 相关文章所有 4 个版本

[PDF] sciencedirect.com

Risk verification of stochastic systems with neural network controllers

M Cleaveland, L Lindemann, R Ivanov, GJ Pappas - Artificial Intelligence, 2022 - Elsevier

Motivated by the fragility of neural network (NN) controllers in safety-critical applications, we
present a data-driven framework for verifying the risk of stochastic dynamical systems with …

被引用次数：11 相关文章所有 8 个版本

[PDF] arxiv.org

Tipping point forecasting in non-stationary dynamics on function spaces

M Liu-Schiaffini, CE Singer, N Kovachki… - arXiv preprint arXiv …, 2023 - arxiv.org

Tipping points are abrupt, drastic, and often irreversible changes in the evolution of non-
stationary and chaotic dynamical systems. For instance, increased greenhouse gas …

被引用次数：6 相关文章所有 2 个版本

[PDF] mlr.press

Regret bounds for risk-sensitive reinforcement learning with lipschitz dynamic risk measures

H Liang, Z Luo - International Conference on Artificial …, 2024 - proceedings.mlr.press

We study finite episodic Markov decision processes incorporating dynamic risk measures to
capture risk sensitivity. To this end, we present two model-based algorithms applied to\emph …

被引用次数：3 相关文章所有 4 个版本

高级搜索

QQ 群