Analysis of temporal-diffference learning with function approximation J Tsitsiklis, B Van Roy Advances in neural information processing systems 9, 1996 | 2179 | 1996 |
Deep exploration via bootstrapped DQN I Osband, C Blundell, A Pritzel, B Van Roy Advances in neural information processing systems 29, 2016 | 1445 | 2016 |
A tutorial on thompson sampling D Russo, B Van Roy, A Kazerouni, I Osband, Z Wen Foundations and Trends in Machine Learning 11 (1), pp. 1-96, 2018 | 1106 | 2018 |
The linear programming approach to approximate dynamic programming DP De Farias, B Van Roy Operations research 51 (6), 850-865, 2003 | 968 | 2003 |
Regression methods for pricing complex American-style options JN Tsitsiklis, B Van Roy IEEE Transactions on Neural Networks 12 (4), 694-703, 2001 | 862 | 2001 |
Learning to optimize via posterior sampling D Russo, B Van Roy Mathematics of Operations Research 39 (4), 1221-1243, 2014 | 734 | 2014 |
Feature-based methods for large scale dynamic programming JN Tsitsiklis, B Van Roy Machine Learning 22 (1), 59-94, 1996 | 713 | 1996 |
Markov perfect industry dynamics with many firms G Weintraub, CL Benkard, B Van Roy Econometrica 76 (6), 1375-1411, 2008 | 567 | 2008 |
On constraint sampling in the linear programming approach to approximate dynamic programming DP De Farias, B Van Roy Mathematics of operations research 29 (3), 462-478, 2004 | 490 | 2004 |
Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives JN Tsitsiklis, B Van Roy IEEE Transactions on Automatic Control 44 (10), 1840-1851, 1999 | 477 | 1999 |
An information-theoretic analysis of thompson sampling D Russo, B Van Roy Journal of Machine Learning Research 17 (68), 1-30, 2016 | 417 | 2016 |
Generalization and exploration via randomized value functions I Osband, B Van Roy, Z Wen International Conference on Machine Learning, 2377-2386, 2016 | 327 | 2016 |
Deep Exploration via Randomized Value Functions. I Osband, B Van Roy, DJ Russo, Z Wen The Journal of Machine Learning Research 20 (124), 1-62, 2019 | 326 | 2019 |
Consensus propagation CC Moallemi, B Van Roy IEEE Transactions on Information Theory 52 (11), 4753-4766, 2006 | 302 | 2006 |
Solving data mining problems through pattern recognition RL Kennedy, Y Lee, B Van Roy, CD Reed, RP Lippman Upper Saddle River, NJ: Prentice Hall PTR, 2011 | 269* | 2011 |
Dynamic pricing with a prior on market response VF Farias, B Van Roy Operations Research 58 (1), 16-29, 2010 | 269 | 2010 |
Why is posterior sampling better than optimism for reinforcement learning? I Osband, B Van Roy International conference on machine learning, 2701-2710, 2017 | 268 | 2017 |
Eluder dimension and the sample complexity of optimistic exploration D Russo, B Van Roy Advances in Neural Information Processing Systems 26, 2013 | 258 | 2013 |
A neuro-dynamic programming approach to retailer inventory management B Van Roy, DP Bertsekas, Y Lee, JN Tsitsiklis Proceedings of the 36th IEEE Conference on Decision and Control 4, 4052-4057, 1997 | 240 | 1997 |
Learning to optimize via information-directed sampling D Russo, B Van Roy Advances in Neural Information Processing Systems 27, 2014 | 238 | 2014 |