Sample complexity of variance-reduced distributionally robust Q-learning

S Wang, N Si, J Blanchet, Z Zhou - arXiv preprint arXiv:2305.18420, 2023 - arxiv.org
Dynamic decision making under distributional shifts is of fundamental interest in theory and
applications of reinforcement learning: The distribution of the environment on which the data …

Tight finite time bounds of two-time-scale linear stochastic approximation with markovian noise

SU Haque, S Khodadadian, ST Maguluri - arXiv preprint arXiv:2401.00364, 2023 - arxiv.org
Stochastic approximation (SA) is an iterative algorithm to find the fixed point of an operator
given noisy samples of this operator. SA appears in many areas such as optimization and …

Rates of Convergence in the Central Limit Theorem for Markov Chains, with an Application to TD Learning

R Srikant - arXiv preprint arXiv:2401.15719, 2024 - arxiv.org
We prove a non-asymptotic central limit theorem for vector-valued martingale differences
using Stein's method, and use Poisson's equation to extend the result to functions of Markov …

Constant Stepsize Q-learning: Distributional Convergence, Bias and Extrapolation

Y Zhang, Q Xie - arXiv preprint arXiv:2401.13884, 2024 - arxiv.org
Stochastic Approximation (SA) is a widely used algorithmic approach in various fields,
including optimization and reinforcement learning (RL). Among RL algorithms, Q-learning is …

Stochastic approximation mcmc, online inference, and applications in optimization of queueing systems

X Li, J Liang, X Chen, Z Zhang - arXiv preprint arXiv:2309.09545, 2023 - arxiv.org
Stochastic approximation Markov Chain Monte Carlo (SAMCMC) algorithms are a class of
online algorithms having wide-ranging applications, particularly within Markovian systems …

Maximum Entropy Reinforcement Learning via Energy-Based Normalizing Flow

CH Chao, C Feng, WF Sun, CK Lee, S See… - arXiv preprint arXiv …, 2024 - arxiv.org
Existing Maximum-Entropy (MaxEnt) Reinforcement Learning (RL) methods for continuous
action spaces are typically formulated based on actor-critic frameworks and optimized …

Efficient Reinforcement Learning for Global Decision Making in the Presence of Local Agents at Scale

E Anand, G Qu - arXiv preprint arXiv:2403.00222, 2024 - arxiv.org
We study reinforcement learning for global decision-making in the presence of many local
agents, where the global decision-maker makes decisions affecting all local agents, and the …

Optimal Sample Complexity of Reinforcement Learning for Mixing Discounted Markov Decision Processes

S Wang, J Blanchet, P Glynn - arXiv preprint arXiv:2302.07477, 2023 - arxiv.org
We consider the optimal sample complexity theory of tabular reinforcement learning (RL) for
maximizing the infinite horizon discounted reward in a Markov decision process (MDP) …

Variance-aware robust reinforcement learning with linear function approximation under heavy-tailed rewards

X Li, Q Sun - arXiv preprint arXiv:2303.05606, 2023 - arxiv.org
This paper presents two algorithms, AdaOFUL and VARA, for online sequential decision-
making in the presence of heavy-tailed rewards with only finite variances. For linear …

Functional Central Limit Theorem for Two Timescale Stochastic Approximation

FZ Faizal, V Borkar - arXiv preprint arXiv:2306.05723, 2023 - arxiv.org
Two time scale stochastic approximation algorithms emulate singularly perturbed
deterministic differential equations in a certain limiting sense, ie, the interpolated iterates on …