E Duryea, M Ganger, W Hu - Intelligent Control and Automation, 2016 - scirp.org
Q-learning is a popular temporal-difference reinforcement learning algorithm which often
explicitly stores state values using lookup tables. This implementation has been proven to …