Provably efficient reinforcement learning for discounted mdps with feature mapping D Zhou, J He, Q Gu International Conference on Machine Learning, 12793-12802, 2021 | 139 | 2021 |
Logarithmic regret for reinforcement learning with linear function approximation J He, D Zhou, Q Gu International Conference on Machine Learning, 4171-4180, 2021 | 103 | 2021 |
Nearly minimax optimal reinforcement learning for linear markov decision processes J He, H Zhao, D Zhou, Q Gu International Conference on Machine Learning, 12790-12822, 2023 | 52 | 2023 |
Nearly optimal algorithms for linear contextual bandits with adversarial corruptions J He, D Zhou, T Zhang, Q Gu Advances in neural information processing systems 35, 34614-34625, 2022 | 41 | 2022 |
Nearly minimax optimal reinforcement learning for discounted MDPs J He, D Zhou, Q Gu Advances in Neural Information Processing Systems 34, 2021 | 39 | 2021 |
A simple and provably efficient algorithm for asynchronous federated contextual linear bandits J He, T Wang, Y Min, Q Gu Advances in neural information processing systems 35, 4762-4775, 2022 | 31 | 2022 |
Learning stochastic shortest path with linear function approximation Y Min, J He, T Wang, Q Gu International Conference on Machine Learning, 15584-15629, 2022 | 31 | 2022 |
Achieving a fairer future by changing the past J He, AD Procaccia, CA Psomas, D Zeng IJCAI'19, 2019 | 30 | 2019 |
Near-optimal policy optimization algorithms for learning adversarial linear mixture mdps J He, D Zhou, Q Gu International Conference on Artificial Intelligence and Statistics, 4259-4280, 2022 | 29* | 2022 |
Variance-dependent regret bounds for linear bandits and reinforcement learning: Adaptivity and computational efficiency H Zhao, J He, D Zhou, T Zhang, Q Gu The Thirty Sixth Annual Conference on Learning Theory, 4977-5020, 2023 | 22 | 2023 |
Uniform-pac bounds for reinforcement learning with linear function approximation J He, D Zhou, Q Gu Advances in Neural Information Processing Systems 34, 2021 | 17 | 2021 |
Locally differentially private reinforcement learning for linear mixture markov decision processes C Liao, J He, Q Gu Asian Conference on Machine Learning, 627-642, 2023 | 13 | 2023 |
Bandit learning with general function classes: Heteroscedastic noise and variance-dependent regret bounds H Zhao, D Zhou, J He, Q Gu | 12 | 2022 |
On the sample complexity of learning infinite-horizon discounted linear kernel MDPs Y Chen, J He, Q Gu International Conference on Machine Learning, 3149-3183, 2022 | 8 | 2022 |
On the interplay between misspecification and sub-optimality gap in linear contextual bandits W Zhang, J He, Z Fan, Q Gu International Conference on Machine Learning, 41111-41132, 2023 | 7 | 2023 |
Minimax optimal reinforcement learning for discounted mdps J He, D Zhou, Q Gu arXiv preprint arXiv:2010.00587, 2020 | 7 | 2020 |
Reinforcement learning from human feedback with active queries K Ji, J He, Q Gu arXiv preprint arXiv:2402.09401, 2024 | 5 | 2024 |
Cooperative multi-agent reinforcement learning: Asynchronous communication and linear function approximation Y Min, J He, T Wang, Q Gu International Conference on Machine Learning, 24785-24811, 2023 | 5 | 2023 |
Pessimistic nonlinear least-squares value iteration for offline reinforcement learning Q Di, H Zhao, J He, Q Gu arXiv preprint arXiv:2310.01380, 2023 | 4 | 2023 |
A nearly optimal and low-switching algorithm for reinforcement learning with general function approximation H Zhao, J He, Q Gu arXiv preprint arXiv:2311.15238, 2023 | 3 | 2023 |