[HTML][HTML] Hierarchical reinforcement learning: A survey and open research challenges

M Hutsebaut-Buysse, K Mets, S Latré - Machine Learning and Knowledge …, 2022 - mdpi.com
Reinforcement learning (RL) allows an agent to solve sequential decision-making problems
by interacting with an environment in a trial-and-error fashion. When these environments are …

Large sequence models for sequential decision-making: a survey

M Wen, R Lin, H Wang, Y Yang, Y Wen, L Mai… - Frontiers of Computer …, 2023 - Springer
Transformer architectures have facilitated the development of large-scale and general-
purpose sequence models for prediction tasks in natural language processing and computer …

Decision transformer: Reinforcement learning via sequence modeling

L Chen, K Lu, A Rajeswaran, K Lee… - Advances in neural …, 2021 - proceedings.neurips.cc
We introduce a framework that abstracts Reinforcement Learning (RL) as a sequence
modeling problem. This allows us to draw upon the simplicity and scalability of the …

Agent57: Outperforming the atari human benchmark

AP Badia, B Piot, S Kapturowski… - International …, 2020 - proceedings.mlr.press
Atari games have been a long-standing benchmark in the reinforcement learning (RL)
community for the past decade. This benchmark was proposed to test general competency …

[HTML][HTML] Toward human-in-the-loop AI: Enhancing deep reinforcement learning via real-time human guidance for autonomous driving

J Wu, Z Huang, Z Hu, C Lv - Engineering, 2023 - Elsevier
Due to its limited intelligence and abilities, machine learning is currently unable to handle
various situations thus cannot completely replace humans in real-world applications …

Recurrent model-free rl can be a strong baseline for many pomdps

T Ni, B Eysenbach, R Salakhutdinov - arXiv preprint arXiv:2110.05038, 2021 - arxiv.org
Many problems in RL, such as meta-RL, robust RL, generalization in RL, and temporal credit
assignment, can be cast as POMDPs. In theory, simply augmenting model-free RL with …

Rudder: Return decomposition for delayed rewards

JA Arjona-Medina, M Gillhofer… - Advances in …, 2019 - proceedings.neurips.cc
We propose RUDDER, a novel reinforcement learning approach for delayed rewards in
finite Markov decision processes (MDPs). In MDPs the Q-values are equal to the expected …

[HTML][HTML] Dual credit assignment processes underlie dopamine signals in a complex spatial environment

TA Krausz, AE Comrie, AE Kahn, LM Frank, ND Daw… - Neuron, 2023 - cell.com
Animals frequently make decisions based on expectations of future reward (" values").
Values are updated by ongoing experience: places and choices that result in reward are …

Counterfactual credit assignment in model-free reinforcement learning

T Mesnard, T Weber, F Viola, S Thakoor… - arXiv preprint arXiv …, 2020 - arxiv.org
Credit assignment in reinforcement learning is the problem of measuring an action's
influence on future rewards. In particular, this requires separating skill from luck, ie …

Dense reward for free in reinforcement learning from human feedback

AJ Chan, H Sun, S Holt, M van der Schaar - arXiv preprint arXiv …, 2024 - arxiv.org
Reinforcement Learning from Human Feedback (RLHF) has been credited as the key
advance that has allowed Large Language Models (LLMs) to effectively follow instructions …