Guided dialog policy learning: Reward estimation for multi-domain task-oriented dialog R Takanobu, H Zhu, M Huang Conference on Empirical Methods in Natural Language Processing, 100-110, 2019 | 87 | 2019 |
Starling-7b: Improving llm helpfulness & harmlessness with rlaif B Zhu, E Frick, T Wu, H Zhu, J Jiao November, 2023 | 45 | 2023 |
Optimal conservative offline rl with general function approximation via augmented lagrangian P Rashidinejad, H Zhu, K Yang, S Russell, J Jiao arXiv preprint arXiv:2211.00716, 2022 | 32 | 2022 |
Vector-matrix-vector queries for solving linear algebra, statistics, and graph problems C Rashtchian, DP Woodruff, H Zhu Approximation, Randomization, and Combinatorial Optimization. Algorithms and …, 2020 | 31 | 2020 |
Importance weighted actor-critic for optimal conservative offline reinforcement learning H Zhu, P Rashidinejad, J Jiao Advances in Neural Information Processing Systems 36, 2024 | 10 | 2024 |
Learning personalized story evaluation D Wang, K Yang, H Zhu, X Yang, A Cohen, L Li, Y Tian arXiv preprint arXiv:2310.03304, 2023 | 7 | 2023 |
Towards optimal statistical watermarking B Huang, B Zhu, H Zhu, JD Lee, J Jiao, MI Jordan arXiv preprint arXiv:2312.07930, 2023 | 5 | 2023 |
Provably efficient reinforcement learning via surprise bound H Zhu, R Wang, J Lee International Conference on Artificial Intelligence and Statistics, 4006-4032, 2023 | 5 | 2023 |
Average-case communication complexity of statistical problems C Rashtchian, D Woodruff, P Ye, H Zhu Conference on Learning Theory, 3859-3886, 2021 | 5 | 2021 |
Provably efficient offline goal-conditioned reinforcement learning with general function approximation and single-policy concentrability H Zhu, A Zhang Advances in Neural Information Processing Systems 36, 2024 | 4 | 2024 |
On Representation Complexity of Model-based and Model-free Reinforcement Learning H Zhu, B Huang, S Russell arXiv preprint arXiv:2310.01706, 2023 | 3 | 2023 |
End-to-end Story Plot Generator H Zhu, A Cohen, D Wang, K Yang, X Yang, J Jiao, Y Tian arXiv preprint arXiv:2310.08796, 2023 | 2 | 2023 |
Efficient Prompt Caching via Embedding Similarity H Zhu, B Zhu, J Jiao arXiv preprint arXiv:2402.01173, 2024 | 1 | 2024 |
Towards a Theoretical Understanding of the'Reversal Curse'via Training Dynamics H Zhu, B Huang, S Zhang, M Jordan, J Jiao, Y Tian, S Russell arXiv preprint arXiv:2405.04669, 2024 | | 2024 |
Avoiding Catastrophe in Continuous Spaces by Asking for Help B Plaut, H Zhu, S Russell arXiv preprint arXiv:2402.08062, 2024 | | 2024 |