Off-policy evaluation of slate bandit policies via optimizing abstraction

H Kiyohara, M Nomura, Y Saito - Proceedings of the ACM on Web …, 2024 - dl.acm.org
We study off-policy evaluation (OPE) in the problem of slate contextual bandits where a
policy selects multi-dimensional actions known as slates. This problem is widespread in …

[HTML][HTML] Nested replicator dynamics, nested logit choice, and similarity-based learning

P Mertikopoulos, WH Sandholm - Journal of Economic Theory, 2024 - Elsevier
We consider a model of learning and evolution in games whose action sets are endowed
with a partition-based similarity structure intended to capture exogenous similarities …

Efficient methods in counterfactual policy learning and sequential decision making

H Zenati - 2023 - theses.hal.science
Because logged data has become ubiquitous in wide-range applications and since
onlineexploration may be sensitive, counterfactual methods have gained significant …