Approximate dynamic programming using fluid and diffusion approximations with applications...

A Wierman, Z Liu, I Liu… - International Green …, 2014 - ieeexplore.ieee.org

This paper surveys the opportunities and challenges in an emerging area of research that
has the potential to significantly ease the incorporation of renewable energy into the grid as …

被引用次数：236 相关文章所有 10 个版本

[图书][B] Control systems and reinforcement learning

S Meyn - 2022 - books.google.com

A high school student can create deep Q-learning code to control her robot, without any
understanding of the meaning of'deep'or'Q', or why the code sometimes fails. This book is …

被引用次数：152 相关文章所有 3 个版本

[PDF] informs.org

Queueing network controls via deep reinforcement learning

JG Dai, M Gluzman - Stochastic Systems, 2022 - pubsonline.informs.org

Novel advanced policy gradient (APG) methods, such as trust region policy optimization and
proximal policy optimization (PPO), have become the dominant reinforcement learning …

被引用次数：70 相关文章所有 4 个版本

[PDF] siam.org

Mean-field controls with Q-learning for cooperative MARL: convergence and complexity analysis

H Gu, X Guo, X Wei, R Xu - SIAM Journal on Mathematics of Data Science, 2021 - SIAM

Multi-agent reinforcement learning (MARL), despite its popularity and empirical success,
suffers from the curse of dimensionality. This paper builds the mathematical framework to …

被引用次数：74 相关文章所有 3 个版本

Online trajectory and radio resource optimization of cache-enabled UAV wireless networks with content and energy recharging

S Chai, VKN Lau - IEEE Transactions on Signal Processing, 2020 - ieeexplore.ieee.org

Recently, unmanned aerial vehicle (UAV)-assisted wireless communication technology has
been proposed to exploit the favorable propagation property and flexibility of air-to-ground …

被引用次数：60 相关文章所有 5 个版本

[PDF] arxiv.org

Mean field for Markov decision processes: from discrete to continuous optimization

N Gast, B Gaujal, JY Le Boudec - IEEE Transactions on …, 2012 - ieeexplore.ieee.org

We study the convergence of Markov decision processes, composed of a large number of
objects, to optimization problems on ordinary differential equations. We show that the …

被引用次数：126 相关文章所有 28 个版本

[PDF] qut.edu.au

Power control of an energy harvesting sensor for remote state estimation

Y Li, F Zhang, DE Quevedo, V Lau… - IEEE Transactions on …, 2016 - ieeexplore.ieee.org

We investigate sensor transmission power control for remote state estimation. Instead of
using a conventional sensor, a sensor equipped with an energy harvester which can obtain …

被引用次数：74 相关文章所有 9 个版本

[PDF] researchgate.net

Admission control for a multi-server queue with abandonment

YL Koçağa, AR Ward - Queueing Systems, 2010 - Springer

Abstract In a M/M/N+ M queue, when there are many customers waiting, it may be preferable
to reject a new arrival rather than risk that arrival later abandoning without receiving service …

被引用次数：66 相关文章所有 12 个版本

Dynamic scheduling of home care patients to medical providers

AA Cire, A Diamant - Production and Operations …, 2022 - journals.sagepub.com

Home care provides personalized medical care and social support to patients within their
own homes. Our work proposes a dynamic scheduling framework to assist in the assignment …

被引用次数：8 相关文章所有 4 个版本

[PDF] psu.edu

Feature Selection for Neuro‐Dynamic Programming

D Huang, W Chen, P Mehta, S Meyn… - … for Feedback Control, 2012 - Wiley Online Library

Neuro‐dynamic programming encompasses techniques from both reinforcement learning
and approximate dynamic programming. Feature selection refers to the choice of basis that …

被引用次数：19 相关文章所有 7 个版本

高级搜索

QQ 群