Provably efficient q-learning with low switching cost

Y Bai, T Xie, N Jiang, YX Wang - Advances in Neural …, 2019 - proceedings.neurips.cc
We take initial steps in studying PAC-MDP algorithms with limited adaptivity, that is,
algorithms that change its exploration policy as infrequently as possible during regret …

Importance sampling policy evaluation with an estimated behavior policy

J Hanna, S Niekum, P Stone - International Conference on …, 2019 - proceedings.mlr.press
We consider the problem of off-policy evaluation in Markov decision processes. Off-policy
evaluation is the task of evaluating the expected return of one policy with data generated by …

[图书][B] Data efficient reinforcement learning with off-policy and simulated data

JP Hanna - 2019 - search.proquest.com
Learning from interaction with the environment–trying untested actions, observing
successes and failures, and tying effects back to causes--is one of the first capabilities we …

Selector-actor-critic and tuner-actor-critic algorithms for reinforcement learning

A Masadeh, Z Wang, AE Kamal - 2019 11th International …, 2019 - ieeexplore.ieee.org
This work presents two reinforcement learning (RL) architectures, which mimic rational
humans in the way of analyzing the available information and making decisions. The …

Enhancing the performance of energy harvesting wireless communications using optimization and machine learning

A Masadeh - 2019 - search.proquest.com
The motivation behind this thesis is to provide efficient solutions for energy harvesting
communications. Firstly, an energy harvesting underlay cognitive radio relaying network is …

Diverse Exploration in Reinforcement Learning

A Cohen - 2019 - search.proquest.com
The trade-off between exploration and exploitation is a classic problem in rein-forcement
learning that has been the focus of countless research efforts. Informally, the dilemma stems …

[引用][C] Enhancing the performance of energy harvesting wireless communications using optimization and machine learning

[引用][C] Adaptive Off-Policy Policy Gradient Methods

X Gu, J Hanna