Reinforcement learning algorithms: A brief survey

AK Shakya, G Pillai, S Chakrabarty - Expert Systems with Applications, 2023 - Elsevier
Reinforcement Learning (RL) is a machine learning (ML) technique to learn sequential
decision-making in complex problems. RL is inspired by trial-and-error based human/animal …

Alphastar unplugged: Large-scale offline reinforcement learning

M Mathieu, S Ozair, S Srinivasan, C Gulcehre… - arXiv preprint arXiv …, 2023 - arxiv.org
StarCraft II is one of the most challenging simulated reinforcement learning environments; it
is partially observable, stochastic, multi-agent, and mastering StarCraft II requires strategic …

A survey of temporal credit assignment in deep reinforcement learning

E Pignatelli, J Ferret, M Geist, T Mesnard… - arXiv preprint arXiv …, 2023 - arxiv.org
The Credit Assignment Problem (CAP) refers to the longstanding challenge of
Reinforcement Learning (RL) agents to associate actions with their long-term …

Spatio-temporal graph convolutional neural networks for physics-aware grid learning algorithms

T Wu, IL Carreño, A Scaglione… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
This paper proposes novel architectures for spatio-temporal graph convolutional and
recurrent neural networks whose structure is inspired by the physics of power systems. The …

Starcraft ii unplugged: Large scale offline reinforcement learning

M Mathieu, S Ozair, S Srinivasan… - Deep RL Workshop …, 2021 - openreview.net
StarCraft II is one of the most challenging reinforcement learning (RL) environments; it is
partially observable, stochastic, and multi-agent, and mastering StarCraft II requires strategic …

Learning expected emphatic traces for deep RL

R Jiang, S Zhang, V Chelu, A White… - Proceedings of the AAAI …, 2022 - ojs.aaai.org
Off-policy sampling and experience replay are key for improving sample efficiency and
scaling model-free temporal difference learning methods. When combined with function …

Truncated emphatic temporal difference methods for prediction and control

S Zhang, S Whiteson - Journal of Machine Learning Research, 2022 - jmlr.org
Emphatic Temporal Diérence (TD) methods are a class of off-policy Reinforcement Learning
(RL) methods involving the use of followon traces. Despite the theoretical success of …

Imitation from arbitrary experience: A dual unification of reinforcement and imitation learning methods

H Sikchi, A Zhang, S Niekum - Workshop on Reincarnating …, 2023 - openreview.net
It is well known that Reinforcement Learning (RL) can be formulated as a convex program
with linear constraints. The dual form of this formulation is unconstrained, which we refer to …

Improving offline rl by blending heuristics

S Geng, A Pacchiano, A Kolobov, CA Cheng - arXiv preprint arXiv …, 2023 - arxiv.org
We propose Heuristic Blending (HUBL), a simple performance-improving technique for a
broad class of offline RL algorithms based on value bootstrapping. HUBL modifies the …

Adaptive interest for emphatic reinforcement learning

M Klissarov, R Fakoor, JW Mueller… - Advances in …, 2022 - proceedings.neurips.cc
Emphatic algorithms have shown great promise in stabilizing and improving reinforcement
learning by selectively emphasizing the update rule. Although the emphasis fundamentally …