Revisiting Peng’s Q ($ λ $) for Modern Reinforcement Learning

Rethinking the implementation tricks and monotonicity constraint in cooperative multi-agent reinforcement learning

J Hu, S Jiang, SA Harding, H Wu, S Liao - arXiv preprint arXiv:2102.03479, 2021 - arxiv.org

Many complex multi-agent systems such as robot swarms control and autonomous vehicle
coordination can be modeled as Multi-Agent Reinforcement Learning (MARL) tasks. QMIX, a …

被引用次数：49 相关文章所有 3 个版本

[PDF] neurips.cc

The phenomenon of policy churn

T Schaul, A Barreto, J Quan… - Advances in Neural …, 2022 - proceedings.neurips.cc

We identify and study the phenomenon of policy churn, that is, the rapid change of the
greedy policy in value-based reinforcement learning. Policy churn operates at a surprisingly …

被引用次数：16 相关文章所有 5 个版本

[PDF] openreview.net

Averaging -step Returns Reduces Variance in Reinforcement Learning

B Daley, M White, MC Machado - Forty-first International …, 2024 - openreview.net

Multistep returns, such as $ n $-step returns and $\lambda $-returns, are commonly used to
improve the sample efficiency of reinforcement learning (RL) methods. The variance of the …

被引用次数：1 相关文章

[PDF] arxiv.org

Opportunities and challenges from using animal videos in reinforcement learning for navigation

V Giammarino, J Queeney, LC Carstensen… - IFAC-PapersOnLine, 2023 - Elsevier

We investigate the use of animal videos (observations) to improve Reinforcement Learning
(RL) efficiency and performance in navigation tasks with sparse rewards. Motivated by …

被引用次数：3 相关文章所有 5 个版本

[PDF] mlr.press

Trajectory-aware eligibility traces for off-policy reinforcement learning

B Daley, M White, C Amato… - … on Machine Learning, 2023 - proceedings.mlr.press

Off-policy learning from multistep returns is crucial for sample-efficient reinforcement
learning, but counteracting off-policy bias without exacerbating variance is challenging …

被引用次数：4 相关文章所有 6 个版本

[PDF] neurips.cc

The nature of temporal difference errors in multi-step distributional reinforcement learning

Y Tang, R Munos, M Rowland… - Advances in …, 2022 - proceedings.neurips.cc

We study the multi-step off-policy learning approach to distributional RL. Despite the
apparent similarity between value-based RL and distributional RL, our study reveals …

被引用次数：8 相关文章所有 6 个版本

[PDF] arxiv.org

Demystifying the Recency Heuristic in Temporal-Difference Learning

B Daley, MC Machado, M White - arXiv preprint arXiv:2406.12284, 2024 - arxiv.org

The recency heuristic in reinforcement learning is the assumption that stimuli that occurred
closer in time to an acquired reward should be more heavily reinforced. The recency …

被引用次数：1 相关文章

[PDF] openreview.net

Variational oracle guiding for reinforcement learning

D Han, T Kozuno, X Luo, ZY Chen, K Doya… - International …, 2022 - openreview.net

How to make intelligent decisions is a central problem in machine learning and artificial
intelligence. Despite recent successes of deep reinforcement learning (RL) in various …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

Explaining off-policy actor-critic from a bias-variance perspective

TH Fan, PJ Ramadge - arXiv preprint arXiv:2110.02421, 2021 - arxiv.org

Off-policy Actor-Critic algorithms have demonstrated phenomenal experimental performance
but still require better explanations. To this end, we show its policy evaluation error on the …

被引用次数：4 相关文章所有 3 个版本

[PDF] mlr.press

DoMo-AC: doubly multi-step off-policy actor-critic algorithm

Y Tang, T Kozuno, M Rowland… - International …, 2023 - proceedings.mlr.press

Multi-step learning applies lookahead over multiple time steps and has proved valuable in
policy evaluation settings. However, in the optimal control case, the impact of multi-step …

高级搜索

QQ 群