Empirical analysis of off-policy policy evaluation for reinforcement learning

文章

学术资源搜索

获得 5 条结果（用时0.02秒）

我的图书馆

Empirical analysis of off-policy policy evaluation for reinforcement learning

在引用文章中搜索

[PDF] mlr.press

Understanding the curse of horizon in off-policy evaluation via conditional importance sampling

Y Liu, PL Bacon, E Brunskill - International Conference on …, 2020 - proceedings.mlr.press

Off-policy policy estimators that use importance sampling (IS) can suffer from high variance
in long-horizon domains, and there has been particular excitement over new IS methods that …

被引用次数：42 相关文章所有 7 个版本

[PDF] gu.se

An Empirical Survey of Bandits in an Industrial Recommender System Setting

T Schwarz, J Brandby - 2023 - gupea.ub.gu.se

In this thesis, the effects of incorporating unstructured data—images in the wild—in
contextual multi-armed bandits are investigated, when used within a recommender system …

[PDF] chalmers.se

An Empirical Survey of Bandits in an Industrial Recommender System Setting

J Brandby, T Schwarz - 2023 - odr.chalmers.se

In this thesis, the effects of incorporating unstructured data—images in the wild—in
contextual multi-armed bandits are investigated, when used within a recommender system …

Machine Learning for Information Extraction from Pathology Reports and Adaptive Offline Value Estimation in Reinforcement Learning

B Park - 2022 - search.proquest.com

The thesis is divided into two parts. The first part focuses on a healthcare-related application
of machine learning, and the second part focuses on offline evaluation of reinforcement …

[图书][B] Adaptive and Efficient Batch Reinforcement Learning Algorithms

Y Liu - 2021 - search.proquest.com

Reinforcement learning (RL) focuses on solving the problem of sequential decision-making
in an unknown environment and achieved many successes in domains with good simulators …

高级搜索

QQ 群

Empirical analysis of off-policy policy evaluation for reinforcement learning

Understanding the curse of horizon in off-policy evaluation via conditional importance sampling

An Empirical Survey of Bandits in an Industrial Recommender System Setting

An Empirical Survey of Bandits in an Industrial Recommender System Setting

Machine Learning for Information Extraction from Pathology Reports and Adaptive Offline Value Estimation in Reinforcement Learning

[图书][B] Adaptive and Efficient Batch Reinforcement Learning Algorithms

引用