D Gao, I Miller,
A Allami,
D Lin - 2024 IEEE 6th International …, 2024 - ieeexplore.ieee.org
Leveraging the scalable efficacy of reinforcement learning from AI feedback (RLAIF), large
language models (LLMs) can be refined toward human intent alignment. While current …