Understanding the curse of horizon in off-policy evaluation via conditional importance sampling

Y Liu, PL Bacon, E Brunskill - International Conference on …, 2020 - proceedings.mlr.press
Off-policy policy estimators that use importance sampling (IS) can suffer from high variance
in long-horizon domains, and there has been particular excitement over new IS methods that …

An Empirical Survey of Bandits in an Industrial Recommender System Setting

T Schwarz, J Brandby - 2023 - gupea.ub.gu.se
In this thesis, the effects of incorporating unstructured data—images in the wild—in
contextual multi-armed bandits are investigated, when used within a recommender system …

An Empirical Survey of Bandits in an Industrial Recommender System Setting

J Brandby, T Schwarz - 2023 - odr.chalmers.se
In this thesis, the effects of incorporating unstructured data—images in the wild—in
contextual multi-armed bandits are investigated, when used within a recommender system …

Machine Learning for Information Extraction from Pathology Reports and Adaptive Offline Value Estimation in Reinforcement Learning

B Park - 2022 - search.proquest.com
The thesis is divided into two parts. The first part focuses on a healthcare-related application
of machine learning, and the second part focuses on offline evaluation of reinforcement …

[图书][B] Adaptive and Efficient Batch Reinforcement Learning Algorithms

Y Liu - 2021 - search.proquest.com
Reinforcement learning (RL) focuses on solving the problem of sequential decision-making
in an unknown environment and achieved many successes in domains with good simulators …