Hindsight credit assignment

M Hutsebaut-Buysse, K Mets, S Latré - Machine Learning and Knowledge …, 2022 - mdpi.com

Reinforcement learning (RL) allows an agent to solve sequential decision-making problems
by interacting with an environment in a trial-and-error fashion. When these environments are …

被引用次数：74 相关文章所有 8 个版本

[PDF] arxiv.org

Large sequence models for sequential decision-making: a survey

M Wen, R Lin, H Wang, Y Yang, Y Wen, L Mai… - Frontiers of Computer …, 2023 - Springer

Transformer architectures have facilitated the development of large-scale and general-
purpose sequence models for prediction tasks in natural language processing and computer …

被引用次数：15 相关文章所有 6 个版本

[PDF] neurips.cc

Decision transformer: Reinforcement learning via sequence modeling

L Chen, K Lu, A Rajeswaran, K Lee… - Advances in neural …, 2021 - proceedings.neurips.cc

We introduce a framework that abstracts Reinforcement Learning (RL) as a sequence
modeling problem. This allows us to draw upon the simplicity and scalability of the …

被引用次数：1335 相关文章所有 11 个版本

[PDF] mlr.press

Agent57: Outperforming the atari human benchmark

AP Badia, B Piot, S Kapturowski… - International …, 2020 - proceedings.mlr.press

Atari games have been a long-standing benchmark in the reinforcement learning (RL)
community for the past decade. This benchmark was proposed to test general competency …

被引用次数：630 相关文章所有 5 个版本

[HTML] sciencedirect.com

[HTML][HTML] Toward human-in-the-loop AI: Enhancing deep reinforcement learning via real-time human guidance for autonomous driving

J Wu, Z Huang, Z Hu, C Lv - Engineering, 2023 - Elsevier

Due to its limited intelligence and abilities, machine learning is currently unable to handle
various situations thus cannot completely replace humans in real-world applications …

被引用次数：83 相关文章所有 4 个版本

[PDF] arxiv.org

Recurrent model-free rl can be a strong baseline for many pomdps

T Ni, B Eysenbach, R Salakhutdinov - arXiv preprint arXiv:2110.05038, 2021 - arxiv.org

Many problems in RL, such as meta-RL, robust RL, generalization in RL, and temporal credit
assignment, can be cast as POMDPs. In theory, simply augmenting model-free RL with …

被引用次数：85 相关文章所有 4 个版本

[PDF] neurips.cc

Rudder: Return decomposition for delayed rewards

JA Arjona-Medina, M Gillhofer… - Advances in …, 2019 - proceedings.neurips.cc

We propose RUDDER, a novel reinforcement learning approach for delayed rewards in
finite Markov decision processes (MDPs). In MDPs the Q-values are equal to the expected …

被引用次数：237 相关文章所有 9 个版本

[HTML] cell.com

[HTML][HTML] Dual credit assignment processes underlie dopamine signals in a complex spatial environment

TA Krausz, AE Comrie, AE Kahn, LM Frank, ND Daw… - Neuron, 2023 - cell.com

Animals frequently make decisions based on expectations of future reward (" values").
Values are updated by ongoing experience: places and choices that result in reward are …

被引用次数：14 相关文章所有 10 个版本

[PDF] arxiv.org

Counterfactual credit assignment in model-free reinforcement learning

T Mesnard, T Weber, F Viola, S Thakoor… - arXiv preprint arXiv …, 2020 - arxiv.org

Credit assignment in reinforcement learning is the problem of measuring an action's
influence on future rewards. In particular, this requires separating skill from luck, ie …

被引用次数：67 相关文章所有 6 个版本

[PDF] arxiv.org

Dense reward for free in reinforcement learning from human feedback

AJ Chan, H Sun, S Holt, M van der Schaar - arXiv preprint arXiv …, 2024 - arxiv.org

Reinforcement Learning from Human Feedback (RLHF) has been credited as the key
advance that has allowed Large Language Models (LLMs) to effectively follow instructions …

被引用次数：9 相关文章所有 3 个版本

高级搜索

QQ 群

[HTML][HTML] Hierarchical reinforcement learning: A survey and open research challenges

Large sequence models for sequential decision-making: a survey

Decision transformer: Reinforcement learning via sequence modeling

Agent57: Outperforming the atari human benchmark

[HTML][HTML] Toward human-in-the-loop AI: Enhancing deep reinforcement learning via real-time human guidance for autonomous driving

Recurrent model-free rl can be a strong baseline for many pomdps

Rudder: Return decomposition for delayed rewards

[HTML][HTML] Dual credit assignment processes underlie dopamine signals in a complex spatial environment

Counterfactual credit assignment in model-free reinforcement learning

Dense reward for free in reinforcement learning from human feedback

引用