A statistical analysis of polyak-ruppert averaged q-learning

S Wang, N Si, J Blanchet, Z Zhou - arXiv preprint arXiv:2305.18420, 2023 - arxiv.org

Dynamic decision making under distributional shifts is of fundamental interest in theory and
applications of reinforcement learning: The distribution of the environment on which the data …

被引用次数：9 相关文章所有 2 个版本

[PDF] arxiv.org

Tight finite time bounds of two-time-scale linear stochastic approximation with markovian noise

SU Haque, S Khodadadian, ST Maguluri - arXiv preprint arXiv:2401.00364, 2023 - arxiv.org

Stochastic approximation (SA) is an iterative algorithm to find the fixed point of an operator
given noisy samples of this operator. SA appears in many areas such as optimization and …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

Rates of Convergence in the Central Limit Theorem for Markov Chains, with an Application to TD Learning

R Srikant - arXiv preprint arXiv:2401.15719, 2024 - arxiv.org

We prove a non-asymptotic central limit theorem for vector-valued martingale differences
using Stein's method, and use Poisson's equation to extend the result to functions of Markov …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Constant Stepsize Q-learning: Distributional Convergence, Bias and Extrapolation

Y Zhang, Q Xie - arXiv preprint arXiv:2401.13884, 2024 - arxiv.org

Stochastic Approximation (SA) is a widely used algorithmic approach in various fields,
including optimization and reinforcement learning (RL). Among RL algorithms, Q-learning is …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Stochastic approximation mcmc, online inference, and applications in optimization of queueing systems

X Li, J Liang, X Chen, Z Zhang - arXiv preprint arXiv:2309.09545, 2023 - arxiv.org

Stochastic approximation Markov Chain Monte Carlo (SAMCMC) algorithms are a class of
online algorithms having wide-ranging applications, particularly within Markovian systems …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Maximum Entropy Reinforcement Learning via Energy-Based Normalizing Flow

CH Chao, C Feng, WF Sun, CK Lee, S See… - arXiv preprint arXiv …, 2024 - arxiv.org

Existing Maximum-Entropy (MaxEnt) Reinforcement Learning (RL) methods for continuous
action spaces are typically formulated based on actor-critic frameworks and optimized …

Efficient Reinforcement Learning for Global Decision Making in the Presence of Local Agents at Scale

E Anand, G Qu - arXiv preprint arXiv:2403.00222, 2024 - arxiv.org

We study reinforcement learning for global decision-making in the presence of many local
agents, where the global decision-maker makes decisions affecting all local agents, and the …

Optimal Sample Complexity of Reinforcement Learning for Mixing Discounted Markov Decision Processes

S Wang, J Blanchet, P Glynn - arXiv preprint arXiv:2302.07477, 2023 - arxiv.org

We consider the optimal sample complexity theory of tabular reinforcement learning (RL) for
maximizing the infinite horizon discounted reward in a Markov decision process (MDP) …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Variance-aware robust reinforcement learning with linear function approximation under heavy-tailed rewards

X Li, Q Sun - arXiv preprint arXiv:2303.05606, 2023 - arxiv.org

This paper presents two algorithms, AdaOFUL and VARA, for online sequential decision-
making in the presence of heavy-tailed rewards with only finite variances. For linear …

Functional Central Limit Theorem for Two Timescale Stochastic Approximation

FZ Faizal, V Borkar - arXiv preprint arXiv:2306.05723, 2023 - arxiv.org

Two time scale stochastic approximation algorithms emulate singularly perturbed
deterministic differential equations in a certain limiting sense, ie, the interpolated iterates on …

高级搜索

QQ 群