Finite-time analysis for double Q-learning

A Feriani, E Hossain - IEEE Communications Surveys & …, 2021 - ieeexplore.ieee.org

Deep Reinforcement Learning (DRL) has recently witnessed significant advances that have
led to multiple successes in solving sequential decision-making problems in various …

被引用次数：243 相关文章所有 3 个版本

[PDF] wiley.com

Recent advances in reinforcement learning in finance

B Hambly, R Xu, H Yang - Mathematical Finance, 2023 - Wiley Online Library

The rapid changes in the finance industry due to the increasing amount of data have
revolutionized the techniques on data processing and data analysis and brought new …

被引用次数：125 相关文章所有 13 个版本

[PDF] arxiv.org

Settling the sample complexity of model-based offline reinforcement learning

G Li, L Shi, Y Chen, Y Chi, Y Wei - The Annals of Statistics, 2024 - projecteuclid.org

Settling the sample complexity of model-based offline reinforcement learning Page 1 The
Annals of Statistics 2024, Vol. 52, No. 1, 233–260 https://doi.org/10.1214/23-AOS2342 © …

被引用次数：77 相关文章所有 8 个版本

[HTML] informs.org

Is Q-learning minimax optimal? a tight sample complexity analysis

G Li, C Cai, Y Chen, Y Wei, Y Chi - Operations Research, 2024 - pubsonline.informs.org

Q-learning, which seeks to learn the optimal Q-function of a Markov decision process (MDP)
in a model-free fashion, lies at the heart of reinforcement learning. When it comes to the …

被引用次数：80 相关文章所有 11 个版本

[PDF] ieee.org

The efficacy of pessimism in asynchronous Q-learning

Y Yan, G Li, Y Chen, J Fan - IEEE Transactions on Information …, 2023 - ieeexplore.ieee.org

This paper is concerned with the asynchronous form of Q-learning, which applies a
stochastic approximation scheme to Markovian data samples. Motivated by the recent …

被引用次数：49 相关文章所有 8 个版本

[PDF] neurips.cc

Breaking the sample complexity barrier to regret-optimal model-free reinforcement learning

G Li, L Shi, Y Chen, Y Gu, Y Chi - Advances in Neural …, 2021 - proceedings.neurips.cc

Achieving sample efficiency in online episodic reinforcement learning (RL) requires
optimally balancing exploration and exploitation. When it comes to a finite-horizon episodic …

被引用次数：49 相关文章所有 13 个版本

[PDF] neurips.cc

Sample complexity of asynchronous Q-learning: Sharper analysis and variance reduction

G Li, Y Wei, Y Chi, Y Gu… - Advances in neural …, 2020 - proceedings.neurips.cc

Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a
Markov decision process (MDP), based on a single trajectory of Markovian samples induced …

被引用次数：91 相关文章所有 14 个版本

[PDF] ieee.org

Sample complexity of asynchronous Q-learning: Sharper analysis and variance reduction

G Li, Y Wei, Y Chi, Y Gu, Y Chen - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a
Markov decision process (MDP), based on a single trajectory of Markovian samples induced …

被引用次数：27 相关文章所有 5 个版本

[PDF] neurips.cc

Finite-time analysis of single-timescale actor-critic

X Chen, L Zhao - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Actor-critic methods have achieved significant success in many challenging applications.
However, its finite-time convergence is still poorly understood in the most practical single …

被引用次数：8 相关文章所有 8 个版本

[PDF] ieee.org

Task-Oriented Satellite-UAV Networks With Mobile Edge Computing

P Wei, W Feng, Y Chen, N Ge… - IEEE Open Journal of …, 2023 - ieeexplore.ieee.org

Networked robots have become crucial for unmanned applications since they can
collaborate to complete complex tasks in remote/hazardous/depopulated areas. Due to the …

被引用次数：3 相关文章所有 3 个版本

高级搜索

QQ 群