Data-efficient policy evaluation through behavior policy search

Y Bai, T Xie, N Jiang, YX Wang - Advances in Neural …, 2019 - proceedings.neurips.cc

We take initial steps in studying PAC-MDP algorithms with limited adaptivity, that is,
algorithms that change its exploration policy as infrequently as possible during regret …

被引用次数：101 相关文章所有 10 个版本

[PDF] mlr.press

Importance sampling policy evaluation with an estimated behavior policy

J Hanna, S Niekum, P Stone - International Conference on …, 2019 - proceedings.mlr.press

We consider the problem of off-policy evaluation in Markov decision processes. Off-policy
evaluation is the task of evaluating the expected return of one policy with data generated by …

被引用次数：71 相关文章所有 14 个版本

[PDF] utexas.edu

[图书][B] Data efficient reinforcement learning with off-policy and simulated data

JP Hanna - 2019 - search.proquest.com

Learning from interaction with the environment–trying untested actions, observing
successes and failures, and tying effects back to causes--is one of the first capabilities we …

被引用次数：9 相关文章所有 4 个版本

[PDF] nsf.gov

Selector-actor-critic and tuner-actor-critic algorithms for reinforcement learning

A Masadeh, Z Wang, AE Kamal - 2019 11th International …, 2019 - ieeexplore.ieee.org

This work presents two reinforcement learning (RL) architectures, which mimic rational
humans in the way of analyzing the available information and making decisions. The …

被引用次数：4 相关文章所有 6 个版本

Enhancing the performance of energy harvesting wireless communications using optimization and machine learning

A Masadeh - 2019 - search.proquest.com

The motivation behind this thesis is to provide efficient solutions for energy harvesting
communications. Firstly, an energy harvesting underlay cognitive radio relaying network is …

被引用次数：4 相关文章所有 3 个版本

Diverse Exploration in Reinforcement Learning

A Cohen - 2019 - search.proquest.com

The trade-off between exploration and exploitation is a classic problem in rein-forcement
learning that has been the focus of countless research efforts. Informally, the dilemma stems …

[引用][C] Enhancing the performance of energy harvesting wireless communications using optimization and machine learning

AM Ala'eddin - 2019

[引用][C] Adaptive Off-Policy Policy Gradient Methods

X Gu, J Hanna

高级搜索

QQ 群