查看文章

researchgate.net 中的 [PDF]

Solving semi-Markov decision problems using average reward reinforcement learning

作者

Tapas K Das, Abhijit Gosavi, Sridhar Mahadevan, Nicholas Marchalleck

发表日期

1999/4

期刊

Management Science

卷号

期号

页码范围

560-574

出版商

INFORMS

简介

A large class of problems of sequential decision making under uncertainty, of which the underlying probability structure is a Markov process, can be modeled as stochastic dynamic programs (referred to, in general, as Markov decision problems or MDPs). However, the computational complexity of the classical MDP algorithms, such as value iteration and policy iteration, is prohibitive and can grow intractably with the size of the problem and its related data. Furthermore, these techniques require for each action the one step transition probability and reward matrices, and obtaining these is often unrealistic for large and complex systems. Recently, there has been much interest in a simulation-based stochastic approximation framework called reinforcement learning (RL), for computing near optimal policies for MDPs. RL has been successfully applied to very large problems, such as elevator scheduling, and dynamic …

引用总数

被引用次数：311

199920002001200220032004200520062007200820092010201120122013201420152016201720182019202020212022202320243 4 5 8 13 8 16 13 20 13 14 10 10 12 7 10 8 15 11 12 12 13 24 18 20 11

学术搜索中的文章

Solving semi-Markov decision problems using average reward reinforcement learning

TK Das, A Gosavi, S Mahadevan, N Marchalleck - Management Science, 1999

被引用次数：311 相关文章所有 12 个版本