A finite time analysis of temporal difference learning with linear function approximation J Bhandari, D Russo, R Singal Conference on learning theory, 1691-1692, 2018 | 386 | 2018 |
Global optimality guarantees for policy gradient methods J Bhandari, D Russo Operations Research, 2024 | 250 | 2024 |
On the linear convergence of policy gradient methods for finite mdps J Bhandari, D Russo International Conference on Artificial Intelligence and Statistics, 2386-2394, 2021 | 70 | 2021 |
A note on the linear convergence of policy gradient methods J Bhandari, D Russo arXiv preprint arXiv:2007.11120, 79, 2020 | 25 | 2020 |
On the tightness of an LP relaxation for rational optimization and its applications V Avadhanula, J Bhandari, V Goyal, A Zeevi Operations Research Letters 44 (5), 612-617, 2016 | 13 | 2016 |
Elliptical Slice Sampling with Expectation Propagation. F Fagan, J Bhandari, JP Cunningham UAI, 2016 | 11 | 2016 |
Optimizing long-term value for auction-based recommender systems via on-policy reinforcement learning R Xu, J Bhandari, D Korenkevych, F Liu, Y He, A Nikulkov, Z Zhu Proceedings of the 17th ACM Conference on Recommender Systems, 955-962, 2023 | 6 | 2023 |
Optimization foundations of reinforcement learning J Bhandari Columbia University, 2020 | 6 | 2020 |
Pearl: A Production-ready Reinforcement Learning Agent Z Zhu, RS Braz, J Bhandari, D Jiang, Y Wan, Y Efroni, L Wang, R Xu, ... arXiv preprint arXiv:2312.03814, 2023 | 2 | 2023 |
MULTI-OBJECTIVE CUSTOMER JOURNEY OPTIMIZATION J BHANDARI, W DAI, JUN HE, T XU, Z YAN, LEI ZHANG US Patent 20,210,217,047, 2021 | | 2021 |
Annular Augmentation Sampling F Fagan, J Bhandari, J Cunningham Artificial Intelligence and Statistics, 139-147, 2017 | | 2017 |
User Scheduling in Cognitive Radio Networks J Bhandari, N Bolia Journal of Computations & Modelling 3 (3), 177-193, 2013 | | 2013 |