Generative exploration and exploitation

C Wang, H Zhou, Y Hu, Y Huo, B Li, T Liu… - Proceedings of the …, 2024 - ojs.aaai.org

Applying Reinforcement Learning (RL) to sequence generation models enables the direct
optimization of long-term rewards (\textit {eg,} BLEU and human feedback), but typically …

被引用次数：4 相关文章所有 6 个版本

Optimal bipartite graph matching-based goal selection for policy-based hindsight learning

S Sun, H Zhang, Z Liu, X Chen, X Lan - Neurocomputing, 2024 - Elsevier

The sparse reward problem stands as a significant challenge in the field of reinforcement
learning. Hindsight Experience Replay (HER) addresses this by goal relabeling, allowing …

Remax: Relational representation for multi-agent exploration

H Ryu, H Shin, J Park - … of the 21st International Conference on …, 2022 - dl.acm.org

Training a multi-agent reinforcement learning (MARL) model with a sparse reward is
generally difficult because numerous combinations of interactions among agents induce a …

被引用次数：6 相关文章所有 5 个版本

[PDF] arxiv.org

SEREN: Knowing When to Explore and When to Exploit

C Yu, D Mguni, D Li, A Sootla, J Wang… - arXiv preprint arXiv …, 2022 - arxiv.org

Efficient reinforcement learning (RL) involves a trade-off between" exploitative" actions that
maximise expected reward and" explorative'" ones that sample unvisited states. To …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Density-based curriculum for multi-goal reinforcement learning with sparse rewards

D Yang, H Zhang, X Lan, J Ding - arXiv preprint arXiv:2109.08903, 2021 - arxiv.org

Multi-goal reinforcement learning (RL) aims to qualify the agent to accomplish multi-goal
tasks, which is of great importance in learning scalable robotic manipulation skills. However …

被引用次数：1 相关文章所有 2 个版本

[PDF] openreview.net

Impulse Control Arbitration for A Dual System of Exploitation and Exploration

C Yu, DH Mguni, D Li, A Sootla, J Wang, N Burgess - openreview.net

Efficient reinforcement learning (RL) involves a trade-off between" exploitative" actions that
maximise expected reward and``explorative" ones that lead to the visitation of" novel" states …

高级搜索

QQ 群