Secrets of RLHF in Large Language Models Part I: PPO R Zheng, S Dou, S Gao, Y Hua, W Shen, B Wang, Y Liu, S Jin, Q Liu, ... NeurIPS 2023 Workshop on Instruction Tuning and Instruction Following (best …, 2023 | 69* | 2023 |
Secrets of rlhf in large language models part ii: Reward modeling B Wang, R Zheng, L Chen, Y Liu, S Dou, C Huang, W Shen, S Jin, E Zhou, ... arXiv preprint arXiv:2401.06080, 2024 | 32* | 2024 |
LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment S Dou, E Zhou, Y Liu, S Gao, J Zhao, W Shen, Y Zhou, Z Xi, X Wang, ... arXiv preprint arXiv:2312.09979, 2023 | 21* | 2023 |
Loose lips sink ships: Mitigating Length Bias in Reinforcement Learning from Human Feedback W Shen, R Zheng, W Zhan, J Zhao, S Dou, T Gui, Q Zhang, X Huang The 2023 Conference on Empirical Methods in Natural Language Processing, 2023 | 11 | 2023 |
Human-instruction-free llm self-alignment with limited samples H Guo, Y Yao, W Shen, J Wei, X Zhang, Z Wang, Y Liu arXiv preprint arXiv:2401.06785, 2024 | 6 | 2024 |
Overcoming reward overoptimization via adversarial policy optimization with lightweight uncertainty estimation X Zhang, JF Ton, W Shen, H Wang, Y Liu arXiv preprint arXiv:2403.05171, 2024 | 4 | 2024 |
Improving Generalization of Alignment with Human Preferences through Group Invariant Learning R Zheng, W Shen, Y Hua, W Lai, S Dou, Y Zhou, Z Xi, X Wang, H Huang, ... Twelfth International Conference on Learning Representations (ICLR 2024 …, 2023 | 3 | 2023 |
Training large language models for reasoning through reverse curriculum reinforcement learning Z Xi, W Chen, B Hong, S Jin, R Zheng, W He, Y Ding, S Liu, X Guo, ... arXiv preprint arXiv:2402.05808, 2024 | 2 | 2024 |
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback S Dou, Y Liu, H Jia, L Xiong, E Zhou, J Shan, C Huang, W Shen, X Fan, ... arXiv preprint arXiv:2402.01391, 2024 | 2 | 2024 |
Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards W Shen, X Zhang, Y Yao, R Zheng, H Guo, Y Liu arXiv preprint arXiv:2403.07708, 2024 | 1 | 2024 |
Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback S Gao, Q Ge, W Shen, S Dou, J Ye, X Wang, R Zheng, Y Zou, Z Chen, ... arXiv preprint arXiv:2401.11458, 2024 | 1 | 2024 |