Online learning in Markov decision processes with changing cost sequences- 学术资源搜索

Online learning in Markov decision processes with changing cost sequences

T Dick, A Gyorgy, C Szepesvari - … Conference on Machine …, 2014 - proceedings.mlr.press

International Conference on Machine Learning, 2014•proceedings.mlr.press

Abstract

In this paper we consider online learning in finite Markov decision processes (MDPs) with changing cost sequences under full and bandit-information. We propose to view this problem as an instance of online linear optimization. We propose two methods for this problem: MD^ 2 (mirror descent with approximate projections) and the continuous exponential weights algorithm with Dikin walks. We provide a rigorous complexity analysis of these techniques, while providing near-optimal regret-bounds (in particular, we take into account the computational costs of performing approximate projections in MD^ 2). In the case of full-information feedback, our results complement existing ones. In the case of bandit-information feedback we consider the online stochastic shortest path problem, a special case of the above MDP problems, and manage to improve the existing results by removing the previous restrictive assumption that the state-visitation probabilities are uniformly bounded away from zero under all policies.

proceedings.mlr.press

展开收起

被引用次数：88 相关文章所有 4 个版本

以上显示的是最相近的搜索结果。查看全部搜索结果

高级搜索

QQ 群

Online learning in Markov decision processes with changing cost sequences

引用