[图书][B] Constrained Markov decision processes

E Altman - 2021 - taylorfrancis.com
This book provides a unified approach for the study of constrained Markov decision
processes with a finite state space and unbounded costs. Unlike the single controller case …

Zap Q-learning

AM Devraj, S Meyn - Advances in Neural Information …, 2017 - proceedings.neurips.cc
The Zap Q-learning algorithm introduced in this paper is an improvement of Watkins' original
algorithm and recent competitors in several respects. It is a matrix-gain algorithm designed …

Two time-scale stochastic approximation with controlled Markov noise and off-policy temporal-difference learning

P Karmakar, S Bhatnagar - Mathematics of Operations …, 2018 - pubsonline.informs.org
We present for the first time an asymptotic convergence analysis of two time-scale stochastic
approximation driven by “controlled” Markov noise. In particular, the faster and slower …

On recursive estimation for hidden Markov models

T Rydén - Stochastic Processes and their Applications, 1997 - Elsevier
Hidden Markov models (HMMs) have during the last decade become a widespread tool for
modelling sequences of dependent random variables. In this paper we consider a recursive …

Online statistical inference for nonlinear stochastic approximation with markovian data

X Li, J Liang, Z Zhang - arXiv preprint arXiv:2302.07690, 2023 - arxiv.org
We study the statistical inference of nonlinear stochastic approximation algorithms utilizing a
single trajectory of Markovian data. Our methodology has practical applications in various …

Fundamental design principles for reinforcement learning algorithms

AM Devraj, A Bušić, S Meyn - Handbook of Reinforcement Learning and …, 2021 - Springer
Along with the sharp increase in visibility of the field, the rate at which new reinforcement
learning algorithms are being proposed is at a new peak. While the surge in activity is …

Fastest convergence for Q-learning

AM Devraj, SP Meyn - arXiv preprint arXiv:1707.03770, 2017 - arxiv.org
The Zap Q-learning algorithm introduced in this paper is an improvement of Watkins' original
algorithm and recent competitors in several respects. It is a matrix-gain algorithm designed …

Fundamental limits of remote estimation of autoregressive Markov processes under communication constraints

J Chakravorty, A Mahajan - 2016 Information Theory and …, 2016 - ieeexplore.ieee.org
The fundamental limits of remote estimation of autoregressive Markov processes under
communication constraints are presented. The remote estimation system consists of a …

The algorithmic learning equations: Evolving strategies in dynamic games

A Cartea, P Chang, J Penalva… - Available at SSRN …, 2022 - papers.ssrn.com
We introduce the algorithmic learning equations, a set of ordinary differential equations
which characterizes the finite-time and asymptotic behavior of the stochastic interaction …

Asynchronous stochastic approximation with differential inclusions

S Perkins, DS Leslie - Stochastic Systems, 2013 - pubsonline.informs.org
The asymptotic pseudo-trajectory approach to stochastic approximation of Benaïm,
Hofbauer and Sorin is extended for asynchronous stochastic approximations with a set …