相关文章- 学术资源搜索

Belief reward shaping in reinforcement learning

O Marom, B Rosman - Proceedings of the AAAI conference on artificial …, 2018 - ojs.aaai.org

A key challenge in many reinforcement learning problems is delayed rewards, which can
significantly slow down learning. Although reward shaping has previously been introduced …

被引用次数：81 相关文章所有 11 个版本

[PDF] arxiv.org

A Bayesian sampling approach to exploration in reinforcement learning

J Asmuth, L Li, ML Littman, A Nouri… - arXiv preprint arXiv …, 2012 - arxiv.org

We present a modular approach to reinforcement learning that uses a Bayesian
representation of the uncertainty over models. The approach, BOSS (Best of Sampled Set) …

被引用次数：202 相关文章所有 20 个版本

[PDF] aaai.org

[PDF][PDF] The influence of reward on the speed of reinforcement learning: An analysis of shaping

A Laud, G DeJong - Proceedings of the 20th International Conference …, 2003 - cdn.aaai.org

Shaping can be an effective method for improving the learning rate in reinforcement
systems. Previously, shaping has been heuristically motivated and implemented. We …

被引用次数：74 相关文章所有 5 个版本

[PDF] neurips.cc

Reinforcement learning with multiple experts: A bayesian model combination approach

M Gimelfarb, S Sanner, CG Lee - Advances in neural …, 2018 - proceedings.neurips.cc

Potential based reward shaping is a powerful technique for accelerating convergence of
reinforcement learning algorithms. Typically, such information includes an estimate of the …

被引用次数：29 相关文章所有 6 个版本

[PDF] berkeley.edu

[图书][B] Shaping and policy search in reinforcement learning

AY Ng - 2003 - search.proquest.com

To make reinforcement learning algorithms run in a reasonable amount of time, it is
frequently necessary to use a well-chosen reward function that gives appropriate “hints” to …

被引用次数：154 相关文章所有 7 个版本

[PDF] aaai.org

Temporal-logic-based reward shaping for continuing reinforcement learning tasks

Y Jiang, S Bharadwaj, B Wu, R Shah, U Topcu… - Proceedings of the …, 2021 - ojs.aaai.org

In continuing tasks, average-reward reinforcement learning may be a more appropriate
problem formulation than the more common discounted reward formulation. As usual …

被引用次数：51 相关文章所有 13 个版本

[PDF] mlr.press

Near optimal reward-free reinforcement learning

Z Zhang, S Du, X Ji - International Conference on Machine …, 2021 - proceedings.mlr.press

We study the reward-free reinforcement learning framework, which is particularly suitable for
batch reinforcement learning and scenarios where one needs policies for multiple reward …

被引用次数：26 相关文章所有 3 个版本

[PDF] arxiv.org

The optimal reward baseline for gradient-based reinforcement learning

L Weaver, N Tao - arXiv preprint arXiv:1301.2315, 2013 - arxiv.org

There exist a number of reinforcement learning algorithms which learnby climbing the
gradient of expected reward. Their long-runconvergence has been proved, even in partially …

被引用次数：310 相关文章所有 10 个版本

[PDF] neurips.cc

Learning values across many orders of magnitude

HP Van Hasselt, A Guez, M Hessel… - Advances in neural …, 2016 - proceedings.neurips.cc

Most learning algorithms are not invariant to the scale of the signal that is being
approximated. We propose to adaptively normalize the targets used in the learning updates …

被引用次数：191 相关文章所有 5 个版本

[PDF] arxiv.org

Counterfactual credit assignment in model-free reinforcement learning

T Mesnard, T Weber, F Viola, S Thakoor… - arXiv preprint arXiv …, 2020 - arxiv.org

Credit assignment in reinforcement learning is the problem of measuring an action's
influence on future rewards. In particular, this requires separating skill from luck, ie …

被引用次数：67 相关文章所有 6 个版本

高级搜索

QQ 群

Belief reward shaping in reinforcement learning

A Bayesian sampling approach to exploration in reinforcement learning

[PDF][PDF] The influence of reward on the speed of reinforcement learning: An analysis of shaping

Reinforcement learning with multiple experts: A bayesian model combination approach

[图书][B] Shaping and policy search in reinforcement learning

Temporal-logic-based reward shaping for continuing reinforcement learning tasks

Near optimal reward-free reinforcement learning

The optimal reward baseline for gradient-based reinforcement learning

Learning values across many orders of magnitude

Counterfactual credit assignment in model-free reinforcement learning

相关搜索

引用