作者
Tapas K Das, Abhijit Gosavi, Sridhar Mahadevan, Nicholas Marchalleck
发表日期
1999/4
期刊
Management Science
卷号
45
期号
4
页码范围
560-574
出版商
INFORMS
简介
A large class of problems of sequential decision making under uncertainty, of which the underlying probability structure is a Markov process, can be modeled as stochastic dynamic programs (referred to, in general, as Markov decision problems or MDPs). However, the computational complexity of the classical MDP algorithms, such as value iteration and policy iteration, is prohibitive and can grow intractably with the size of the problem and its related data. Furthermore, these techniques require for each action the one step transition probability and reward matrices, and obtaining these is often unrealistic for large and complex systems. Recently, there has been much interest in a simulation-based stochastic approximation framework called reinforcement learning (RL), for computing near optimal policies for MDPs. RL has been successfully applied to very large problems, such as elevator scheduling, and dynamic …
引用总数
19992000200120022003200420052006200720082009201020112012201320142015201620172018201920202021202220232024345813816132013141010127108151112121324182011
学术搜索中的文章