A finite-time analysis of Q-learning with neural network function approximation

P Xu, Q Gu - International Conference on Machine Learning, 2020 - proceedings.mlr.press
Q-learning with neural network function approximation (neural Q-learning for short) is
among the most prevalent deep reinforcement learning algorithms. Despite its empirical …

Smart cities using machine learning and intelligent applications

AG Prawiyogi, S Purnama… - … Transactions on Artificial …, 2022 - journal.pandawan.id
The goal of smart cities is to properly manage to expand urbanization, Reduce energy
usage, Enhance the economic and quality of life of the locals while also preserving the …

Characterizing the exact behaviors of temporal difference learning algorithms using Markov jump linear system theory

B Hu, U Syed - Advances in neural information processing …, 2019 - proceedings.neurips.cc
In this paper, we provide a unified analysis of temporal difference learning algorithms with
linear function approximators by exploiting their connections to Markov jump linear systems …

Finite-time analysis of decentralized temporal-difference learning with linear function approximation

J Sun, G Wang, GB Giannakis… - International …, 2020 - proceedings.mlr.press
Motivated by the emerging use of multi-agent reinforcement learning (MARL) in engineering
applications such as networked robotics, swarming drones, and sensor networks, we …

Single-timescale stochastic nonconvex-concave optimization for smooth nonlinear td learning

S Qiu, Z Yang, X Wei, J Ye, Z Wang - arXiv preprint arXiv:2008.10103, 2020 - arxiv.org
Temporal-Difference (TD) learning with nonlinear smooth function approximation for policy
evaluation has achieved great success in modern reinforcement learning. It is shown that …

Reanalysis of variance reduced temporal difference learning

T Xu, Z Wang, Y Zhou, Y Liang - arXiv preprint arXiv:2001.01898, 2020 - arxiv.org
Temporal difference (TD) learning is a popular algorithm for policy evaluation in
reinforcement learning, but the vanilla TD can substantially suffer from the inherent …

Decentralized TD tracking with linear function approximation and its finite-time analysis

G Wang, S Lu, G Giannakis… - Advances in neural …, 2020 - proceedings.neurips.cc
The present contribution deals with decentralized policy evaluation in multi-agent Markov
decision processes using temporal-difference (TD) methods with linear function …

A single-timescale analysis for stochastic approximation with multiple coupled sequences

H Shen, T Chen - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Stochastic approximation (SA) with multiple coupled sequences has found broad
applications in machine learning such as bilevel learning and reinforcement learning (RL) …

Concentration of contractive stochastic approximation and reinforcement learning

S Chandak, VS Borkar, P Dodhia - Stochastic Systems, 2022 - pubsonline.informs.org
Using a martingale concentration inequality, concentration bounds “from time n 0 on” are
derived for stochastic approximation algorithms with contractive maps and both martingale …

On rademacher complexity-based generalization bounds for deep learning

LV Truong - arXiv preprint arXiv:2208.04284, 2022 - arxiv.org
In this paper, we develop some novel bounds for the Rademacher complexity and the
generalization error in deep learning with iid and Markov datasets. The new Rademacher …