S Casper, X Davies, C Shi, T Krendl Gilbert… - arXiv e …, 2023 - ui.adsabs.harvard.edu
Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …