Efficient energy management in smart grids with finite horizon Q-learning

VP Vivek, S Bhatnagar - Sustainable Energy, Grids and Networks, 2024 - Elsevier
Efficient energy distribution in smart grids is an important problem driven by the need to
manage increasing power consumption across the globe. This problem has been studied in …

Actor–Critic or Critic–Actor? A Tale of Two Time Scales

S Bhatnagar, VS Borkar, S Guin - IEEE Control Systems Letters, 2023 - ieeexplore.ieee.org
We revisit the standard formulation of tabular actor-critic algorithm as a two time-scale
stochastic approximation with value function computed on a faster time-scale and policy …

A model-adaptive random search actor critic: convergence analysis and inventory-control case studies

Y Luo, J Hu, A Gosavi - Annals of Operations Research, 2024 - Springer
Reinforcement learning (RL) is an exciting area within the domain of Markov Decision
Processes (MDPs) in which the underlying optimization problem is solved either in a …

Generalized speedy Q-learning

I John, C Kamanchi, S Bhatnagar - IEEE Control Systems …, 2020 - ieeexplore.ieee.org
In this letter, we derive a generalization of the Speedy Q-learning (SQL) algorithm that was
proposed in the Reinforcement Learning (RL) literature to handle slow convergence of …

On exploiting spectral properties for solving MDP with large state space

L Liu, A Chattopadhyay, U Mitra - 2017 55th Annual Allerton …, 2017 - ieeexplore.ieee.org
A large number of systems are well-modeled by Markov Decision Processes (MDPs). In
particular, certain wireless communication networks and biological networks admit such …

Demystifying Approximate Value-based RL with -greedy Exploration: A Differential Inclusion View

A Gopalan, G Thoppe - arXiv preprint arXiv:2205.13617, 2022 - arxiv.org
Q-learning and SARSA with $\epsilon $-greedy exploration are leading reinforcement
learning methods. Their tabular forms converge to the optimal Q-function under reasonable …

Should You Trust DQN?

A Gopalan, G Thoppe - ICML 2024 Workshop: Aligning Reinforcement … - openreview.net
For a Reinforcement Learning (RL) algorithm to be practically useful, the policy it estimates
in the limit must be superior to the initial guess, at least on average. In this work, we show …

[PDF][PDF] Demystifying Approximate Value-based RL with∈-greedy Exploration: A Differential Inclusion View

A Gopalan, G Thoppe - researchgate.net
Q-learning and SARSA with ϵ-greedy exploration are leading reinforcement learning
methods. Their tabular forms converge to the optimal Q-function under reasonable …

Demystifying Approximate RL with -greedy Exploration: A Differential Inclusion View

A Gopalan, G Thoppe - openreview.net
Q-learning and SARSA (0) with $\epsilon $-greedy exploration are leading reinforcement
learning methods, and their tabular forms converge to the optimal Q-function under …

[引用][C] Gradient-based algorithms for zeroth-order optimization

LA Prashanth, S Bhatnagar - 2024 - Now publishers