Belief reward shaping in reinforcement learning

O Marom, B Rosman - Proceedings of the AAAI conference on artificial …, 2018 - ojs.aaai.org
A key challenge in many reinforcement learning problems is delayed rewards, which can
significantly slow down learning. Although reward shaping has previously been introduced …

A Bayesian sampling approach to exploration in reinforcement learning

J Asmuth, L Li, ML Littman, A Nouri… - arXiv preprint arXiv …, 2012 - arxiv.org
We present a modular approach to reinforcement learning that uses a Bayesian
representation of the uncertainty over models. The approach, BOSS (Best of Sampled Set) …

[PDF][PDF] The influence of reward on the speed of reinforcement learning: An analysis of shaping

A Laud, G DeJong - Proceedings of the 20th International Conference …, 2003 - cdn.aaai.org
Shaping can be an effective method for improving the learning rate in reinforcement
systems. Previously, shaping has been heuristically motivated and implemented. We …

Reinforcement learning with multiple experts: A bayesian model combination approach

M Gimelfarb, S Sanner, CG Lee - Advances in neural …, 2018 - proceedings.neurips.cc
Potential based reward shaping is a powerful technique for accelerating convergence of
reinforcement learning algorithms. Typically, such information includes an estimate of the …

[图书][B] Shaping and policy search in reinforcement learning

AY Ng - 2003 - search.proquest.com
To make reinforcement learning algorithms run in a reasonable amount of time, it is
frequently necessary to use a well-chosen reward function that gives appropriate “hints” to …

Temporal-logic-based reward shaping for continuing reinforcement learning tasks

Y Jiang, S Bharadwaj, B Wu, R Shah, U Topcu… - Proceedings of the …, 2021 - ojs.aaai.org
In continuing tasks, average-reward reinforcement learning may be a more appropriate
problem formulation than the more common discounted reward formulation. As usual …

Near optimal reward-free reinforcement learning

Z Zhang, S Du, X Ji - International Conference on Machine …, 2021 - proceedings.mlr.press
We study the reward-free reinforcement learning framework, which is particularly suitable for
batch reinforcement learning and scenarios where one needs policies for multiple reward …

The optimal reward baseline for gradient-based reinforcement learning

L Weaver, N Tao - arXiv preprint arXiv:1301.2315, 2013 - arxiv.org
There exist a number of reinforcement learning algorithms which learnby climbing the
gradient of expected reward. Their long-runconvergence has been proved, even in partially …

Learning values across many orders of magnitude

HP Van Hasselt, A Guez, M Hessel… - Advances in neural …, 2016 - proceedings.neurips.cc
Most learning algorithms are not invariant to the scale of the signal that is being
approximated. We propose to adaptively normalize the targets used in the learning updates …

Counterfactual credit assignment in model-free reinforcement learning

T Mesnard, T Weber, F Viola, S Thakoor… - arXiv preprint arXiv …, 2020 - arxiv.org
Credit assignment in reinforcement learning is the problem of measuring an action's
influence on future rewards. In particular, this requires separating skill from luck, ie …