Temporal difference Bayesian model averaging: A Bayesian perspective on adapting lambda

P Thomas, E Brunskill - International Conference on …, 2016 - proceedings.mlr.press

In this paper we present a new way of predicting the performance of a reinforcement
learning policy given historical data that may have been generated by a different policy. The …

被引用次数：748 相关文章所有 14 个版本

[PDF] neurips.cc

Meta-gradient reinforcement learning

Z Xu, HP van Hasselt, D Silver - Advances in neural …, 2018 - proceedings.neurips.cc

The goal of reinforcement learning algorithms is to estimate and/or optimise the value
function. However, unlike supervised learning, no teacher or oracle is available to provide …

被引用次数：366 相关文章所有 7 个版本

[PDF] jair.org

On monte carlo tree search and reinforcement learning

T Vodopivec, S Samothrakis, B Ster - Journal of Artificial Intelligence …, 2017 - jair.org

Fuelled by successes in Computer Go, Monte Carlo tree search (MCTS) has achieved wide-
spread adoption within the games community. Its links to traditional reinforcement learning …

被引用次数：104 相关文章所有 6 个版本

[PDF] neurips.cc

Fast efficient hyperparameter tuning for policy gradient methods

S Paul, V Kurin, S Whiteson - Advances in Neural …, 2019 - proceedings.neurips.cc

The performance of policy gradient methods is sensitive to hyperparameter settings that
must be tuned for any new application. Widely used grid search methods for tuning …

被引用次数：44 相关文章所有 6 个版本

[PDF] jair.org Full View

Automated reinforcement learning (autorl): A survey and open problems

J Parker-Holder, R Rajan, X Song, A Biedenkapp… - Journal of Artificial …, 2022 - jair.org

Abstract The combination of Reinforcement Learning (RL) with deep learning has led to a
series of impressive feats, with many believing (deep) RL provides a path towards generally …

被引用次数：91 相关文章所有 10 个版本

[PDF] arxiv.org

{\epsilon}-bmc: A bayesian ensemble approach to epsilon-greedy exploration in model-free reinforcement learning

M Gimelfarb, S Sanner, CG Lee - arXiv preprint arXiv:2007.00869, 2020 - arxiv.org

Resolving the exploration-exploitation trade-off remains a fundamental problem in the
design and implementation of reinforcement learning (RL) algorithms. In this paper, we …

被引用次数：27 相关文章所有 6 个版本

[PDF] arxiv.org

A greedy approach to adapting the trace parameter for temporal difference learning

M White, A White - arXiv preprint arXiv:1607.00446, 2016 - arxiv.org

One of the main obstacles to broad application of reinforcement learning methods is the
parameter sensitivity of our core learning algorithms. In many large-scale applications …

被引用次数：47 相关文章所有 7 个版本

[PDF] arxiv.org

Fast efficient hyperparameter tuning for policy gradients

S Paul, V Kurin, S Whiteson - arXiv preprint arXiv:1902.06583, 2019 - arxiv.org

The performance of policy gradient methods is sensitive to hyperparameter settings that
must be tuned for any new application. Widely used grid search methods for tuning …

被引用次数：31 相关文章所有 3 个版本

[PDF] neurips.cc

Reinforcement learning with multiple experts: A bayesian model combination approach

M Gimelfarb, S Sanner, CG Lee - Advances in neural …, 2018 - proceedings.neurips.cc

Potential based reward shaping is a powerful technique for accelerating convergence of
reinforcement learning algorithms. Typically, such information includes an estimate of the …

被引用次数：32 相关文章所有 6 个版本

[PDF] mlr.press

On the Rate of Convergence and Error Bounds for LSTD (λ)

M Tagorti, B Scherrer - International Conference on Machine …, 2015 - proceedings.mlr.press

We consider LSTD (λ), the least-squares temporal-difference algorithm with eligibility traces
algorithm proposed by Boyan (2002). It computes a linear approximation of the value …

被引用次数：40 相关文章所有 16 个版本

高级搜索

QQ 群