Emphatic algorithms for deep reinforcement learning

AK Shakya, G Pillai, S Chakrabarty - Expert Systems with Applications, 2023 - Elsevier

Reinforcement Learning (RL) is a machine learning (ML) technique to learn sequential
decision-making in complex problems. RL is inspired by trial-and-error based human/animal …

被引用次数：104 相关文章所有 2 个版本

[PDF] arxiv.org

Alphastar unplugged: Large-scale offline reinforcement learning

M Mathieu, S Ozair, S Srinivasan, C Gulcehre… - arXiv preprint arXiv …, 2023 - arxiv.org

StarCraft II is one of the most challenging simulated reinforcement learning environments; it
is partially observable, stochastic, multi-agent, and mastering StarCraft II requires strategic …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

A survey of temporal credit assignment in deep reinforcement learning

E Pignatelli, J Ferret, M Geist, T Mesnard… - arXiv preprint arXiv …, 2023 - arxiv.org

The Credit Assignment Problem (CAP) refers to the longstanding challenge of
Reinforcement Learning (RL) agents to associate actions with their long-term …

被引用次数：5 相关文章所有 3 个版本

[PDF] arxiv.org

Spatio-temporal graph convolutional neural networks for physics-aware grid learning algorithms

T Wu, IL Carreño, A Scaglione… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

This paper proposes novel architectures for spatio-temporal graph convolutional and
recurrent neural networks whose structure is inspired by the physics of power systems. The …

被引用次数：16 相关文章所有 4 个版本

[PDF] openreview.net

Starcraft ii unplugged: Large scale offline reinforcement learning

M Mathieu, S Ozair, S Srinivasan… - Deep RL Workshop …, 2021 - openreview.net

StarCraft II is one of the most challenging reinforcement learning (RL) environments; it is
partially observable, stochastic, and multi-agent, and mastering StarCraft II requires strategic …

被引用次数：18 相关文章

[PDF] aaai.org

Learning expected emphatic traces for deep RL

R Jiang, S Zhang, V Chelu, A White… - Proceedings of the AAAI …, 2022 - ojs.aaai.org

Off-policy sampling and experience replay are key for improving sample efficiency and
scaling model-free temporal difference learning methods. When combined with function …

被引用次数：15 相关文章所有 7 个版本

[PDF] jmlr.org

Truncated emphatic temporal difference methods for prediction and control

S Zhang, S Whiteson - Journal of Machine Learning Research, 2022 - jmlr.org

Emphatic Temporal Diérence (TD) methods are a class of off-policy Reinforcement Learning
(RL) methods involving the use of followon traces. Despite the theoretical success of …

被引用次数：10 相关文章所有 5 个版本

[PDF] openreview.net

Imitation from arbitrary experience: A dual unification of reinforcement and imitation learning methods

H Sikchi, A Zhang, S Niekum - Workshop on Reincarnating …, 2023 - openreview.net

It is well known that Reinforcement Learning (RL) can be formulated as a convex program
with linear constraints. The dual form of this formulation is unconstrained, which we refer to …

被引用次数：5 相关文章

[PDF] arxiv.org

Improving offline rl by blending heuristics

S Geng, A Pacchiano, A Kolobov, CA Cheng - arXiv preprint arXiv …, 2023 - arxiv.org

We propose Heuristic Blending (HUBL), a simple performance-improving technique for a
broad class of offline RL algorithms based on value bootstrapping. HUBL modifies the …

被引用次数：3 相关文章所有 3 个版本

[PDF] neurips.cc

Adaptive interest for emphatic reinforcement learning

M Klissarov, R Fakoor, JW Mueller… - Advances in …, 2022 - proceedings.neurips.cc

Emphatic algorithms have shown great promise in stabilizing and improving reinforcement
learning by selectively emphasizing the update rule. Although the emphasis fundamentally …

被引用次数：3 相关文章所有 5 个版本

高级搜索

QQ 群