A multistep Lyapunov approach for finite-time analysis of biased stochastic approximation

P Xu, Q Gu - International Conference on Machine Learning, 2020 - proceedings.mlr.press

Q-learning with neural network function approximation (neural Q-learning for short) is
among the most prevalent deep reinforcement learning algorithms. Despite its empirical …

被引用次数：95 相关文章所有 7 个版本

[PDF] pandawan.id

Smart cities using machine learning and intelligent applications

AG Prawiyogi, S Purnama… - … Transactions on Artificial …, 2022 - journal.pandawan.id

The goal of smart cities is to properly manage to expand urbanization, Reduce energy
usage, Enhance the economic and quality of life of the locals while also preserving the …

被引用次数：29 相关文章所有 2 个版本

[PDF] neurips.cc

Characterizing the exact behaviors of temporal difference learning algorithms using Markov jump linear system theory

B Hu, U Syed - Advances in neural information processing …, 2019 - proceedings.neurips.cc

In this paper, we provide a unified analysis of temporal difference learning algorithms with
linear function approximators by exploiting their connections to Markov jump linear systems …

被引用次数：70 相关文章所有 9 个版本

[PDF] mlr.press

Finite-time analysis of decentralized temporal-difference learning with linear function approximation

J Sun, G Wang, GB Giannakis… - International …, 2020 - proceedings.mlr.press

Motivated by the emerging use of multi-agent reinforcement learning (MARL) in engineering
applications such as networked robotics, swarming drones, and sensor networks, we …

被引用次数：59 相关文章所有 9 个版本

[PDF] arxiv.org

Single-timescale stochastic nonconvex-concave optimization for smooth nonlinear td learning

S Qiu, Z Yang, X Wei, J Ye, Z Wang - arXiv preprint arXiv:2008.10103, 2020 - arxiv.org

Temporal-Difference (TD) learning with nonlinear smooth function approximation for policy
evaluation has achieved great success in modern reinforcement learning. It is shown that …

被引用次数：41 相关文章所有 3 个版本

[PDF] arxiv.org

Reanalysis of variance reduced temporal difference learning

T Xu, Z Wang, Y Zhou, Y Liang - arXiv preprint arXiv:2001.01898, 2020 - arxiv.org

Temporal difference (TD) learning is a popular algorithm for policy evaluation in
reinforcement learning, but the vanilla TD can substantially suffer from the inherent …

被引用次数：52 相关文章所有 4 个版本

[PDF] neurips.cc

Decentralized TD tracking with linear function approximation and its finite-time analysis

G Wang, S Lu, G Giannakis… - Advances in neural …, 2020 - proceedings.neurips.cc

The present contribution deals with decentralized policy evaluation in multi-agent Markov
decision processes using temporal-difference (TD) methods with linear function …

被引用次数：36 相关文章所有 7 个版本

[PDF] neurips.cc

A single-timescale analysis for stochastic approximation with multiple coupled sequences

H Shen, T Chen - Advances in Neural Information …, 2022 - proceedings.neurips.cc

Stochastic approximation (SA) with multiple coupled sequences has found broad
applications in machine learning such as bilevel learning and reinforcement learning (RL) …

被引用次数：16 相关文章所有 7 个版本

[PDF] informs.org

Concentration of contractive stochastic approximation and reinforcement learning

S Chandak, VS Borkar, P Dodhia - Stochastic Systems, 2022 - pubsonline.informs.org

Using a martingale concentration inequality, concentration bounds “from time n 0 on” are
derived for stochastic approximation algorithms with contractive maps and both martingale …

被引用次数：22 相关文章所有 3 个版本

[PDF] arxiv.org

On rademacher complexity-based generalization bounds for deep learning

LV Truong - arXiv preprint arXiv:2208.04284, 2022 - arxiv.org

In this paper, we develop some novel bounds for the Rademacher complexity and the
generalization error in deep learning with iid and Markov datasets. The new Rademacher …

被引用次数：16 相关文章所有 2 个版本

高级搜索

QQ 群