More adaptive algorithms for adversarial bandits CY Wei, H Luo Conference On Learning Theory, 1263-1291, 2018 | 164 | 2018 |
Online reinforcement learning in stochastic games CY Wei, YT Hong, CJ Lu Advances in Neural Information Processing Systems 30, 2017 | 137 | 2017 |
A new algorithm for non-stationary contextual bandits: Efficient, optimal and parameter-free Y Chen, CW Lee, H Luo, CY Wei Conference on Learning Theory, 696-726, 2019 | 127 | 2019 |
Efficient contextual bandits in non-stationary worlds H Luo, CY Wei, A Agarwal, J Langford Conference On Learning Theory, 1739-1776, 2018 | 125 | 2018 |
Linear last-iterate convergence in constrained saddle-point optimization CY Wei, CW Lee, M Zhang, H Luo International Conference on Learning Representations, 2021 | 118* | 2021 |
Model-free reinforcement learning in infinite-horizon average-reward markov decision processes CY Wei, MJ Jahromi, H Luo, H Sharma, R Jain International conference on machine learning, 10170-10180, 2020 | 97 | 2020 |
Last-iterate convergence of decentralized optimistic gradient descent/ascent in infinite-horizon competitive Markov games CY Wei, CW Lee, M Zhang, H Luo Conference on learning theory, 4259-4299, 2021 | 96 | 2021 |
Non-stationary reinforcement learning without prior knowledge: An optimal black-box approach CY Wei, H Luo Conference on learning theory, 4300-4354, 2021 | 95 | 2021 |
Beating stochastic and adversarial semi-bandits optimally and simultaneously J Zimmert, H Luo, CY Wei International Conference on Machine Learning, 7683-7692, 2019 | 86 | 2019 |
Tracking the best expert in non-stationary stochastic environments CY Wei, YT Hong, CJ Lu Advances in neural information processing systems 29, 2016 | 67 | 2016 |
Independent policy gradient for large-scale markov potential games: Sharper rates, function approximation, and game-agnostic convergence D Ding, CY Wei, K Zhang, M Jovanovic International Conference on Machine Learning, 5166-5220, 2022 | 66 | 2022 |
Learning infinite-horizon average-reward mdps with linear function approximation CY Wei, MJ Jahromi, H Luo, R Jain International Conference on Artificial Intelligence and Statistics, 3007-3015, 2021 | 56 | 2021 |
Bias no more: high-probability data-dependent regret bounds for adversarial bandits and mdps CW Lee, H Luo, CY Wei, M Zhang Advances in neural information processing systems 33, 15522-15533, 2020 | 55 | 2020 |
Efficient online portfolio with logarithmic regret H Luo, CY Wei, K Zheng Advances in neural information processing systems 31, 2018 | 55 | 2018 |
Improved path-length regret bounds for bandits S Bubeck, Y Li, H Luo, CY Wei Conference On Learning Theory, 508-528, 2019 | 51 | 2019 |
A model selection approach for corruption robust reinforcement learning CY Wei, C Dann, J Zimmert International Conference on Algorithmic Learning Theory, 1043-1096, 2022 | 49 | 2022 |
Federated residual learning A Agarwal, J Langford, CY Wei arXiv preprint arXiv:2003.12880, 2020 | 47 | 2020 |
Achieving near instance-optimality and minimax-optimality in stochastic and adversarial linear bandits simultaneously CW Lee, H Luo, CY Wei, M Zhang, X Zhang International Conference on Machine Learning, 6142-6151, 2021 | 46 | 2021 |
Impossible tuning made possible: A new expert algorithm and its applications L Chen, H Luo, CY Wei Conference on Learning Theory, 1216-1259, 2021 | 45 | 2021 |
Policy optimization in adversarial mdps: Improved exploration via dilated bonuses H Luo, CY Wei, CW Lee Advances in Neural Information Processing Systems 34, 22931-22942, 2021 | 44 | 2021 |