关注
Ziniu Li
Ziniu Li
其他姓名Zi-Niu Li
The Chinese University of Hong Kong, Shenzhen
在 link.cuhk.edu.cn 的电子邮件经过验证 - 首页
标题
引用次数
引用次数
年份
Error bounds of imitating policies and environments
T Xu, Z Li, Y Yu
Advances in Neural Information Processing Systems 33, 15737-15749, 2020
972020
Error bounds of imitating policies and environments for reinforcement learning
T Xu, Z Li, Y Yu
IEEE Transactions on Pattern Analysis and Machine Intelligence 44 (10), 6968 …, 2021
332021
Self-Guided Evolution Strategies with Historical Estimated Gradients
FY Liu, ZN Li, C Qian
IJCAI, 1474-1480, 2020
192020
HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning
Z Li, Y Li, Y Zhang, T Zhang, ZQ Luo
International Conference on Learning Representations, 2022
162022
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models
Z Li, T Xu, Y Zhang, Z Lin, Y Yu, R Sun, ZQ Luo
Forty-first International Conference on Machine Learning, 2023
14*2023
Rethinking ValueDice - Does It Really Improve Performance?
Z Li, T Xu, Y Yu, ZQ Luo
ICLR Blog, 2022
132022
Understanding adversarial imitation learning in small sample regime: A stage-coupled analysis
T Xu, Z Li, Y Yu, ZQ Luo
arXiv preprint arXiv:2208.01899, 2022
11*2022
When is RL better than DPO in RLHF? A Representation and Optimization Perspective
Z Li, T Xu, Y Yu
ICLR Tiny Paper, 2024
7*2024
Provably Efficient Adversarial Imitation Learning with Unknown Transitions
T Xu, Z Li, Y Yu, ZQ Luo
UAI, 2367-2378, 2023
72023
Imitation learning from imperfection: Theoretical justifications and algorithms
Z Li, T Xu, Z Qin, Y Yu, ZQ Luo
Advances in Neural Information Processing Systems 36, 2024
6*2024
Why transformers need adam: A hessian perspective
Y Zhang, C Chen, T Ding, Z Li, R Sun, ZQ Luo
arXiv preprint arXiv:2402.16788, 2024
52024
On the Algorithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization
J Xiao, Z Li, X Xie, E Getzen, C Fang, Q Long, WJ Su
arXiv preprint arXiv:2405.16455, 2024
32024
A Note on Target Q-learning For Solving Finite MDPs with A Generative Oracle
Z Li, T Xu, Y Yu
arXiv preprint arXiv:2203.11489, 2022
12022
Efficient Exploration by Novelty-Pursuit
Z Li, XH Chen
Distributed Artificial Intelligence: Second International Conference, DAI …, 2020
12020
Adam-mini: Use Fewer Learning Rates To Gain More
Y Zhang, C Chen, Z Li, T Ding, C Wu, Y Ye, ZQ Luo, R Sun
arXiv preprint arXiv:2406.16793, 2024
2024
BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation
C Jia, P Wang, Z Li, YC Li, Z Zhang, N Tang, Y Yu
arXiv preprint arXiv:2405.17039, 2024
2024
系统目前无法执行此操作,请稍后再试。
文章 1–16