Navigating Noisy Feedback: Enhancing Reinforcement Learning with Error-Prone Language Models M Lin, S Shi, Y Guo, B Chalaki, V Tadiparthi, EM Pari, S Stepputtis, ... Conference on Empirical Methods in Natural Language Processing (EMNLP) Findings, 2024 | | 2024 |
A Reward Analysis of Reinforcement Learning from Large Language Model Feedback M Lin, S Shi, Y Guo, B Chalaki, V Tadiparthi, S Stepputtis, J Campbell, ... Workshop on Reinforcement Learning Beyond Rewards@ Reinforcement Learning …, 0 | | |