J Hanna, S Niekum, P Stone - International Conference on …, 2019 - proceedings.mlr.press
We consider the problem of off-policy evaluation in Markov decision processes. Off-policy evaluation is the task of evaluating the expected return of one policy with data generated by …
Learning from interaction with the environment–trying untested actions, observing successes and failures, and tying effects back to causes--is one of the first capabilities we …
This work presents two reinforcement learning (RL) architectures, which mimic rational humans in the way of analyzing the available information and making decisions. The …
The motivation behind this thesis is to provide efficient solutions for energy harvesting communications. Firstly, an energy harvesting underlay cognitive radio relaying network is …
The trade-off between exploration and exploitation is a classic problem in rein-forcement learning that has been the focus of countless research efforts. Informally, the dilemma stems …