Whittle index based Q-learning for restless bandits with average reward

KE Avrachenkov, VS Borkar - Automatica, 2022 - Elsevier
A novel reinforcement learning algorithm is introduced for multiarmed restless bandits with
average reward, using the paradigms of Q-learning and Whittle index. Specifically, we …

Markov chain optimization of repair and replacement decisions of medical equipment

H Liao, W Cade, S Behdad - Resources, Conservation and Recycling, 2021 - Elsevier
Highlights•The study determines the optimal repair or replacement decision based on the
product lifecycle data.•A Markov Decision Process (MDP) framework is proposed for …

Online reinforcement learning of optimal threshold policies for Markov decision processes

A Roy, V Borkar, A Karandikar… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
To overcome the curses of dimensionality and modeling of dynamic programming methods
to solve Markov decision process problems, reinforcement learning (RL) methods are …

A learning algorithm for the whittle index policy for scheduling web crawlers

K Avrachenkov, VS Borkar - 2019 57th Annual Allerton …, 2019 - ieeexplore.ieee.org
We revisit the Whittle index policy for scheduling web crawlers for ephemeral content
proposed in Avrachenkov and Borkar, IEEE Trans. Control of Network Systems 5 (1), 2016 …

Markovian policy network for efficient robot learning

F Zhang, Y Chen, Z Liu - Neurocomputing, 2022 - Elsevier
What poses a challenge for robot learning is the fact that considerable training samples and
time are required to find an optimal policy in the high-dimensional robot state space. To …

Model-Based Q-Learning with Monotone Policies for Personalized Management of Hypertension

WJ Marrero, L Yi - 2024 Winter Simulation Conference (WSC), 2024 - ieeexplore.ieee.org
Hypertension is a crucial controllable risk factor of atherosclerotic cardiovascular disease, a
leading cause of death in the United States. While traditional analytic techniques may …