[图书][B] Control systems and reinforcement learning

S Meyn - 2022 - books.google.com
A high school student can create deep Q-learning code to control her robot, without any
understanding of the meaning of'deep'or'Q', or why the code sometimes fails. This book is …

Age-of-information oriented scheduling for multichannel IoT systems with correlated sources

J Tong, L Fu, Z Han - IEEE Transactions on Wireless …, 2022 - ieeexplore.ieee.org
Age-of-information (AoI) based minimization problems have been widely considered in
Internet-of-Things (IoT) networks with the settings of multi-source single-channel systems …

Approximate relative value learning for average-reward continuous state MDPs

H Sharma, M Jafarnia-Jahromi… - Uncertainty in Artificial …, 2020 - proceedings.mlr.press
In this paper, we propose an approximate relative value learning (ARVL) algorithm for non-
parametric MDPs with continuous state space and finite actions and average reward …

Fitted value iteration in continuous MDPs with state dependent action sets

H Li, S Shao, A Gupta - IEEE Control Systems Letters, 2021 - ieeexplore.ieee.org
In this letter, we establish the convergence of fitted value iteration and fitted Q-value iteration
for continuous-state continuous-action Markov decision problems (MDPs) with state …

Some limit properties of Markov chains induced by recursive stochastic algorithms

A Gupta, H Chen, J Pi, G Tendolkar - SIAM Journal on Mathematics of Data …, 2020 - SIAM
Recursive stochastic algorithms have gained significant attention in the recent past due to
data-driven applications. Examples include stochastic gradient descent for solving large …

Probabilistic contraction analysis of iterated random operators

A Gupta, R Jain, P Glynn - IEEE Transactions on Automatic …, 2024 - ieeexplore.ieee.org
In many branches of engineering, Banach contraction mapping theorem is employed to
establish the convergence of certain deterministic algorithms. Randomized versions of these …

An approximately optimal relative value learning algorithm for averaged MDPs with continuous states and actions

H Sharma, R Jain - 2019 57th Annual Allerton Conference on …, 2019 - ieeexplore.ieee.org
It has long been a challenging problem to design algorithms for Markov decision processes
(MDPs) with continuous states and actions that are provably approximately optimal and can …

支持重规划的战时保障动态调度研究

曾斌, 樊旭, 李厚朴 - 自动化学报, 2023 - aas.net.cn
复杂多变的战场环境要求后装保障能够根据战场环境变化, 预见性地做出决策. 为此,
提出基于强化学习的动态调度方法. 为准确描述保障调度问题, 提出支持抢占调度 …

Relative Q-learning for Average-Reward Markov Decision Processes with Continuous States

X Yang, J Hu, JQ Hu - IEEE Transactions on Automatic Control, 2024 - ieeexplore.ieee.org
Markov decision processes are widely used for modeling sequential decision-making
problems under uncertainty. We propose an online algorithm for solving a class of average …

[图书][B] Decentralized Multi-Agent Collision Avoidance and Reinforcement Learning

H Li - 2021 - search.proquest.com
This dissertation studies decentralized multi-agent collision avoidance and reinforcement
learning (RL) for Markov Decision Process (MDP) with state-dependent action constraints …