Esrl: Efficient sampling-based reinforcement learning for sequence generation

C Wang, H Zhou, Y Hu, Y Huo, B Li, T Liu… - Proceedings of the …, 2024 - ojs.aaai.org
Applying Reinforcement Learning (RL) to sequence generation models enables the direct
optimization of long-term rewards (\textit {eg,} BLEU and human feedback), but typically …

Optimal bipartite graph matching-based goal selection for policy-based hindsight learning

S Sun, H Zhang, Z Liu, X Chen, X Lan - Neurocomputing, 2024 - Elsevier
The sparse reward problem stands as a significant challenge in the field of reinforcement
learning. Hindsight Experience Replay (HER) addresses this by goal relabeling, allowing …

Remax: Relational representation for multi-agent exploration

H Ryu, H Shin, J Park - … of the 21st International Conference on …, 2022 - dl.acm.org
Training a multi-agent reinforcement learning (MARL) model with a sparse reward is
generally difficult because numerous combinations of interactions among agents induce a …

SEREN: Knowing When to Explore and When to Exploit

C Yu, D Mguni, D Li, A Sootla, J Wang… - arXiv preprint arXiv …, 2022 - arxiv.org
Efficient reinforcement learning (RL) involves a trade-off between" exploitative" actions that
maximise expected reward and" explorative'" ones that sample unvisited states. To …

Density-based curriculum for multi-goal reinforcement learning with sparse rewards

D Yang, H Zhang, X Lan, J Ding - arXiv preprint arXiv:2109.08903, 2021 - arxiv.org
Multi-goal reinforcement learning (RL) aims to qualify the agent to accomplish multi-goal
tasks, which is of great importance in learning scalable robotic manipulation skills. However …

Impulse Control Arbitration for A Dual System of Exploitation and Exploration

C Yu, DH Mguni, D Li, A Sootla, J Wang, N Burgess - openreview.net
Efficient reinforcement learning (RL) involves a trade-off between" exploitative" actions that
maximise expected reward and``explorative" ones that lead to the visitation of" novel" states …