Data-efficient off-policy policy evaluation for reinforcement learning P Thomas, E Brunskill International Conference on Machine Learning, 2139-2148, 2016 | 722 | 2016 |
Value function approximation in reinforcement learning using the Fourier basis G Konidaris, S Osentoski, P Thomas Proceedings of the AAAI conference on artificial intelligence 25 (1), 380-385, 2011 | 545 | 2011 |
High-confidence off-policy evaluation P Thomas, G Theocharous, M Ghavamzadeh Proceedings of the AAAI Conference on Artificial Intelligence 29 (1), 2015 | 315 | 2015 |
High confidence policy improvement P Thomas, G Theocharous, M Ghavamzadeh International Conference on Machine Learning, 2380-2388, 2015 | 220 | 2015 |
Ad recommendation systems for life-time value optimization G Theocharous, PS Thomas, M Ghavamzadeh Proceedings of the 24th international conference on world wide web, 1305-1310, 2015 | 198 | 2015 |
Preventing undesirable behavior of intelligent machines P Thomas, B Castro da Silva, A Barto, S Giguere, Y Brun, E Brunskill Science 366 (6468), 999-1004, 2019 | 195 | 2019 |
Learning action representations for reinforcement learning Y Chandak, G Theocharous, J Kostas, S Jordan, P Thomas International conference on machine learning, 941-950, 2019 | 187 | 2019 |
Increasing the action gap: New operators for reinforcement learning MG Bellemare, G Ostrovski, A Guez, P Thomas, R Munos Proceedings of the AAAI Conference on Artificial Intelligence 30 (1), 2016 | 170 | 2016 |
Bias in natural actor-critic algorithms P Thomas International conference on machine learning, 441-448, 2014 | 158 | 2014 |
Safe reinforcement learning PS Thomas | 119 | 2015 |
Optimizing for the future in non-stationary mdps Y Chandak, G Theocharous, S Shankar, M White, S Mahadevan, ... International Conference on Machine Learning, 1414-1425, 2020 | 71 | 2020 |
Is the policy gradient a gradient? C Nota, PS Thomas arXiv preprint arXiv:1906.07073, 2019 | 70 | 2019 |
Proximal reinforcement learning: A new theory of sequential decision making in primal-dual spaces S Mahadevan, B Liu, P Thomas, W Dabney, S Giguere, N Jacek, I Gemp, ... arXiv preprint arXiv:1405.6757, 2014 | 69 | 2014 |
Training an actor-critic reinforcement learning controller for arm movement using human-generated rewards KM Jagodnik, PS Thomas, AJ van den Bogert, MS Branicky, RF Kirsch IEEE Transactions on Neural Systems and Rehabilitation Engineering 25 (10 …, 2017 | 67 | 2017 |
Evaluating the performance of reinforcement learning algorithms S Jordan, Y Chandak, D Cohen, M Zhang, P Thomas International Conference on Machine Learning, 4962-4973, 2020 | 66 | 2020 |
Predictive off-policy policy evaluation for nonstationary decision problems, with applications to digital marketing P Thomas, G Theocharous, M Ghavamzadeh, I Durugkar, E Brunskill Proceedings of the AAAI Conference on Artificial Intelligence 31 (2), 4740-4745, 2017 | 64 | 2017 |
Policy gradient methods for reinforcement learning with function approximation and action-dependent baselines PS Thomas, E Brunskill arXiv preprint arXiv:1706.06643, 2017 | 62 | 2017 |
Importance Sampling for Fair Policy Selection. S Doroudi, PS Thomas, E Brunskill Grantee Submission, 2017 | 57 | 2017 |
Risk Quantification for Policy Deployment PS Thomas, G Theocharous, M Ghavamzadeh US Patent App. 14/552,047, 2016 | 57 | 2016 |
Offline contextual bandits with high probability fairness guarantees B Metevier, S Giguere, S Brockman, A Kobren, Y Brun, E Brunskill, ... Advances in neural information processing systems 32, 2019 | 54 | 2019 |