Reinforcement learning: An Introduction, 2nd edition RS Sutton, AG Barto MIT press, 2018 | 72490 | 2018 |
Policy gradient methods for reinforcement learning with function approximation RS Sutton, D McAllester, S Singh, Y Mansour Advances in neural information processing systems 12, 1999 | 8331 | 1999 |
Learning to predict by the methods of temporal differences RS Sutton Machine learning 3, 9-44, 1988 | 7946 | 1988 |
Reinforcement learning: An Introduction, 1st edition RS Sutton, AG Barto MIT press, 1998 | 5870* | 1998 |
Neuronlike adaptive elements that can solve difficult learning control problems AG Barto, RS Sutton, CW Anderson IEEE transactions on systems, man, and cybernetics 13 (5), 834-846, 1983 | 5064 | 1983 |
Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning RS Sutton, D Precup, S Singh Artificial intelligence 112 (1-2), 181-211, 1999 | 4410 | 1999 |
Integrated architectures for learning, planning, and reacting based on approximating dynamic programming RS Sutton Proceedings of the International Conference on Machine Learning, 216-224, 1990 | 2215 | 1990 |
Generalization in reinforcement learning: Successful examples using sparse coarse coding RS Sutton Advances in neural information processing systems 8, 1995 | 1861 | 1995 |
Neural networks for control WT Miller, PJ Werbos, RS Sutton MIT press, 1990 | 1833 | 1990 |
Toward a modern theory of adaptive networks: Expectation and prediction. RS Sutton, AG Barto Psychological review 88 (2), 135, 1981 | 1820 | 1981 |
Temporal credit assignment in reinforcement learning RS Sutton University of Massachusetts, Amherst, http://www.incompleteideas.net/papers …, 1984 | 1204 | 1984 |
Dyna, an integrated architecture for learning, planning, and reacting RS Sutton ACM Sigart Bulletin 2 (4), 160-163, 1991 | 1134 | 1991 |
Introduction to reinforcement learning. Vol. 135 RS Sutton, AG Barto MIT press Cambridge 5, 21-22, 1998 | 1121 | 1998 |
Incremental natural actor-critic algorithms S Bhatnagar, RS Sutton, M Ghavamzadeh, M Lee Advances in neural information processing systems, 2008 | 1068 | 2008 |
Reinforcement learning with replacing eligibility traces SP Singh, RS Sutton Machine learning 22 (1), 123-158, 1996 | 1025 | 1996 |
Eligibility traces for off-policy policy evaluation D Precup, RS Sutton, S Singh International Conference on Machine Learning 16, 759-766, 2000 | 946 | 2000 |
Time-derivative models of Pavlovian reinforcement. RS Sutton, AG Barto Learning and Computational Neuroscience: Foundations of Adaptive Networks …, 1990 | 841 | 1990 |
Reinforcement learning is direct adaptive optimal control RS Sutton, AG Barto, RJ Williams IEEE control systems magazine 12 (2), 19-22, 1992 | 811 | 1992 |
A menu of designs for reinforcement learning over time WT Miller, RS Sutton, PJ Werbos MIT press, 1995 | 761 | 1995 |
S., Barto A., G.,“ R Sutton Reinforcement Learning, An Introduction, 2000 | 740* | 2000 |