On the sample complexity of actor-critic method for reinforcement learning with function...

Reinforcement learning for selective key applications in power systems: Recent advances and future challenges

X Chen, G Qu, Y Tang, S Low… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

With large-scale integration of renewable generation and distributed energy resources,
modern power systems are confronted with new operational challenges, such as growing …

被引用次数：200 相关文章所有 6 个版本

[PDF] wiley.com

Recent advances in reinforcement learning in finance

B Hambly, R Xu, H Yang - Mathematical Finance, 2023 - Wiley Online Library

The rapid changes in the finance industry due to the increasing amount of data have
revolutionized the techniques on data processing and data analysis and brought new …

被引用次数：143 相关文章所有 13 个版本

[PDF] neurips.cc

Online robust reinforcement learning with model uncertainty

Y Wang, S Zou - Advances in Neural Information Processing …, 2021 - proceedings.neurips.cc

Robust reinforcement learning (RL) is to find a policy that optimizes the worst-case
performance over an uncertainty set of MDPs. In this paper, we focus on model-free robust …

被引用次数：92 相关文章所有 10 个版本

[PDF] mlr.press

Crpo: A new approach for safe reinforcement learning with convergence guarantee

T Xu, Y Liang, G Lan - International Conference on Machine …, 2021 - proceedings.mlr.press

In safe reinforcement learning (SRL) problems, an agent explores the environment to
maximize an expected total reward and meanwhile avoids violation of certain constraints on …

被引用次数：131 相关文章所有 7 个版本

[PDF] neurips.cc

A finite-time analysis of two time-scale actor-critic methods

YF Wu, W Zhang, P Xu, Q Gu - Advances in Neural …, 2020 - proceedings.neurips.cc

Actor-critic (AC) methods have exhibited great empirical success compared with other
reinforcement learning algorithms, where the actor uses the policy gradient to improve the …

被引用次数：142 相关文章所有 7 个版本

[PDF] neurips.cc

Improving sample complexity bounds for (natural) actor-critic algorithms

T Xu, Z Wang, Y Liang - Advances in Neural Information …, 2020 - proceedings.neurips.cc

The actor-critic (AC) algorithm is a popular method to find an optimal policy in reinforcement
learning. In the infinite horizon scenario, the finite-sample convergence rate for the AC and …

被引用次数：114 相关文章所有 8 个版本

[PDF] github.io

On finite-time convergence of actor-critic algorithm

S Qiu, Z Yang, J Ye, Z Wang - IEEE Journal on Selected Areas …, 2021 - ieeexplore.ieee.org

Actor-critic algorithm and their extensions have made great achievements in real-world
decision-making problems. In contrast to its empirical success, the theoretical understanding …

被引用次数：78 相关文章所有 2 个版本

[PDF] neurips.cc

Learning multi-agent behaviors from distributed and streaming demonstrations

S Liu, M Zhu - Advances in Neural Information Processing …, 2024 - proceedings.neurips.cc

This paper considers the problem of inferring the behaviors of multiple interacting experts by
estimating their reward functions and constraints where the distributed demonstrated …

被引用次数：9 相关文章所有 3 个版本

[PDF] caltech.edu

[PDF][PDF] Reinforcement learning for decision-making and control in power systems: Tutorial, review, and vision

X Chen, G Qu, Y Tang, S Low… - arXiv preprint arXiv …, 2021 - authors.library.caltech.edu

With large-scale integration of renewable generation and distributed energy resources
(DERs), modern power systems are confronted with new operational challenges, such as …

被引用次数：76 相关文章所有 2 个版本

[PDF] mlr.press

A general sample complexity analysis of vanilla policy gradient

R Yuan, RM Gower, A Lazaric - International Conference on …, 2022 - proceedings.mlr.press

We adapt recent tools developed for the analysis of Stochastic Gradient Descent (SGD) in
non-convex optimization to obtain convergence and sample complexity guarantees for the …

被引用次数：54 相关文章所有 10 个版本

高级搜索

QQ 群