Almost optimal model-free reinforcement learningvia reference-advantage decomposition Z Zhang, Y Zhou, X Ji Advances in Neural Information Processing Systems 33, 15198-15207, 2020 | 167 | 2020 |
Is reinforcement learning more difficult than bandits? a near-optimal algorithm escaping the curse of horizon Z Zhang, X Ji, S Du Conference on Learning Theory, 4528-4531, 2021 | 116 | 2021 |
Regret minimization for reinforcement learning by evaluating the optimal bias function Z Zhang, X Ji Advances in Neural Information Processing Systems 32, 2019 | 77 | 2019 |
Improved variance-aware confidence sets for linear bandits and linear mixture mdp Z Zhang, J Yang, X Ji, SS Du Advances in Neural Information Processing Systems 34, 4342-4355, 2021 | 62* | 2021 |
Near optimal reward-free reinforcement learning Z Zhang, S Du, X Ji International Conference on Machine Learning, 12402-12412, 2021 | 55* | 2021 |
Model-free reinforcement learning: from clipped pseudo-regret to sample complexity Z Zhang, Y Zhou, X Ji International Conference on Machine Learning, 12653-12662, 2021 | 39 | 2021 |
Horizon-free reinforcement learning in polynomial time: the power of stationary policies Z Zhang, X Ji, S Du Conference on Learning Theory, 3858-3904, 2022 | 26 | 2022 |
Settling the sample complexity of online reinforcement learning Z Zhang, Y Chen, JD Lee, SS Du The Thirty Seventh Annual Conference on Learning Theory, 5213-5219, 2024 | 15 | 2024 |
Near-optimal regret bounds for multi-batch reinforcement learning Z Zhang, Y Jiang, Y Zhou, X Ji Advances in Neural Information Processing Systems 35, 24586-24596, 2022 | 11 | 2022 |
Sharp variance-dependent bounds in reinforcement learning: Best of both worlds in stochastic and deterministic environments R Zhou, Z Zihan, SS Du International Conference on Machine Learning, 42878-42914, 2023 | 9 | 2023 |
Sharper model-free reinforcement learning for average-reward markov decision processes Z Zhang, Q Xie The Thirty Sixth Annual Conference on Learning Theory, 5476-5477, 2023 | 8 | 2023 |
Optimal multi-distribution learning Z Zhang, W Zhan, Y Chen, SS Du, JD Lee The Thirty Seventh Annual Conference on Learning Theory, 5220-5223, 2024 | 6 | 2024 |
Almost optimal batch-regret tradeoff for batch linear contextual bandits Z Zhang, X Ji, Y Zhou arXiv preprint arXiv:2110.08057, 2021 | 5 | 2021 |
Horizon-Free Regret for Linear Markov Decision Processes Z Zhang, JD Lee, Y Chen, SS Du arXiv preprint arXiv:2403.10738, 2024 | 1 | 2024 |