Solving very large weakly coupled Markov decision processes

C Boutilier, T Dean, S Hanks - Journal of Artificial Intelligence Research, 1999 - jair.org

Planning under uncertainty is a central problem in the study of automated sequential
decision making, and has been addressed by researchers in many different fields, including …

被引用次数：1602 相关文章所有 27 个版本

[PDF] sciencedirect.com

Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning

RS Sutton, D Precup, S Singh - Artificial intelligence, 1999 - Elsevier

Learning, planning, and representing knowledge at multiple levels of temporal abstraction
are key, longstanding challenges for AI. In this paper we consider how these challenges can …

被引用次数：4703 相关文章所有 39 个版本

[PDF] springer.com

A sparse sampling algorithm for near-optimal planning in large Markov decision processes

M Kearns, Y Mansour, AY Ng - Machine learning, 2002 - Springer

A critical issue for the application of Markov decision processes (MDPs) to realistic problems
is how the complexity of planning scales with the size of the MDP. In stochastic …

被引用次数：799 相关文章所有 34 个版本

[PDF] jair.org

Efficient solution algorithms for factored MDPs

C Guestrin, D Koller, R Parr, S Venkataraman - Journal of Artificial …, 2003 - jair.org

This paper addresses the problem of planning under uncertainty in large Markov Decision
Processes (MDPs). Factored MDPs represent a complex state space using state variables …

被引用次数：701 相关文章所有 31 个版本

[PDF] sciencedirect.com

Stochastic dynamic programming with factored representations

C Boutilier, R Dearden, M Goldszmidt - Artificial intelligence, 2000 - Elsevier

Markov decision processes (MDPs) have proven to be popular models for decision-theoretic
planning, but standard dynamic programming algorithms for solving MDPs rely on explicit …

被引用次数：640 相关文章所有 21 个版本

[图书][B] Temporal abstraction in reinforcement learning

D Precup - 2000 - search.proquest.com

Decision making usually involves choosing among different courses of action over a broad
range of time scales. For instance, a person planning a trip to a distant location makes high …

被引用次数：415 相关文章所有 5 个版本

[PDF] mlr.press

Scalable reinforcement learning of localized policies for multi-agent networked systems

G Qu, A Wierman, N Li - Learning for Dynamics and Control, 2020 - proceedings.mlr.press

We study reinforcement learning (RL) in a setting with a network of agents whose states and
actions interact in a local manner where the objective is to find localized policies such that …

被引用次数：114 相关文章所有 11 个版本

[PDF] academia.edu

Online self-reconfiguration with performance guarantee for energy-efficient large-scale cloud computing data centers

H Mi, H Wang, G Yin, Y Zhou, D Shi… - 2010 IEEE International …, 2010 - ieeexplore.ieee.org

In a typical large-scale data center, a set of applications are hosted over virtual machines
(VMs) running on a large number of physical machines (PMs). Such a virtualization …

被引用次数：276 相关文章所有 6 个版本

[PDF] psu.edu

[PDF][PDF] Efficient reinforcement learning in factored MDPs

M Kearns, D Koller - IJCAI, 1999 - Citeseer

We present a provably efficient and near-optimal algorithm for reinforcement learning in
Markov decision processes (MDPs) whose transition model can be factored as a dynamic …

被引用次数：278 相关文章所有 18 个版本

[PDF] psu.edu

[图书][B] Exploiting structure to efficiently solve large scale partially observable Markov decision processes

P Poupart - 2005 - Citeseer

Partially observable Markov decision processes (POMDPs) provide a natural and principled
framework to model a wide range of sequential decision making problems under …

被引用次数：345 相关文章所有 9 个版本

高级搜索

QQ 群