Iterative preference learning from human feedback: Bridging theory and practice for rlhf under kl-constraint W Xiong, H Dong, C Ye, Z Wang, H Zhong, H Ji, N Jiang, T Zhang Forty-first International Conference on Machine Learning, 2024 | 64* | 2024 |
Corruption-robust algorithms with uncertainty weighting for nonlinear contextual bandits and markov decision processes C Ye, W Xiong, Q Gu, T Zhang International Conference on Machine Learning, 39834-39863, 2023 | 19 | 2023 |
A theoretical analysis of nash learning from human feedback under general kl-regularized preference C Ye, W Xiong, Y Zhang, N Jiang, T Zhang arXiv preprint arXiv:2402.07314, 2024 | 18 | 2024 |
Corruption-Robust Offline Reinforcement Learning with General Function Approximation C Ye, R Yang, Q Gu, T Zhang Neural Information Processing Systems, 2023 | 10 | 2023 |
Optimal sample selection through uncertainty estimation and its application in deep learning Y Lin, C Liu, C Ye, Q Lian, Y Yao, T Zhang arXiv preprint arXiv:2309.02476, 2023 | 3 | 2023 |
Towards robust model-based reinforcement learning against adversarial corruption C Ye, J He, Q Gu, T Zhang arXiv preprint arXiv:2402.08991, 2024 | 2 | 2024 |
Provably Efficient High-Dimensional Bandit Learning with Batched Feedbacks J Fan, Z Wang, Z Yang, C Ye arXiv preprint arXiv:2311.13180, 2023 | | 2023 |