C Wang, H Zhou, Y Hu, Y Huo,
B Li, T Liu… - Proceedings of the …, 2024 - ojs.aaai.org
Applying Reinforcement Learning (RL) to sequence generation models enables the direct
optimization of long-term rewards (\textit {eg,} BLEU and human feedback), but typically …