A high school student can create deep Q-learning code to control her robot, without any understanding of the meaning of'deep'or'Q', or why the code sometimes fails. This book is …
JG Dai, M Gluzman - Stochastic Systems, 2022 - pubsonline.informs.org
Novel advanced policy gradient (APG) methods, such as trust region policy optimization and proximal policy optimization (PPO), have become the dominant reinforcement learning …
H Gu, X Guo, X Wei, R Xu - SIAM Journal on Mathematics of Data Science, 2021 - SIAM
Multi-agent reinforcement learning (MARL), despite its popularity and empirical success, suffers from the curse of dimensionality. This paper builds the mathematical framework to …
S Chai, VKN Lau - IEEE Transactions on Signal Processing, 2020 - ieeexplore.ieee.org
Recently, unmanned aerial vehicle (UAV)-assisted wireless communication technology has been proposed to exploit the favorable propagation property and flexibility of air-to-ground …
We study the convergence of Markov decision processes, composed of a large number of objects, to optimization problems on ordinary differential equations. We show that the …
We investigate sensor transmission power control for remote state estimation. Instead of using a conventional sensor, a sensor equipped with an energy harvester which can obtain …
Abstract In a M/M/N+ M queue, when there are many customers waiting, it may be preferable to reject a new arrival rather than risk that arrival later abandoning without receiving service …
AA Cire, A Diamant - Production and Operations …, 2022 - journals.sagepub.com
Home care provides personalized medical care and social support to patients within their own homes. Our work proposes a dynamic scheduling framework to assist in the assignment …
Neuro‐dynamic programming encompasses techniques from both reinforcement learning and approximate dynamic programming. Feature selection refers to the choice of basis that …