Proximal gradient temporal difference learning: Stable reinforcement learning with polynomial...

F Leon, M Gavrilescu - Mathematics, 2021 - mdpi.com

This paper provides a literature review of some of the most important concepts, techniques,
and methodologies used within autonomous car systems. Specifically, we focus on two …

被引用次数：157 相关文章所有 5 个版本

[PDF] arxiv.org

A two-timescale stochastic algorithm framework for bilevel optimization: Complexity analysis and application to actor-critic

M Hong, HT Wai, Z Wang, Z Yang - SIAM Journal on Optimization, 2023 - SIAM

This paper analyzes a two-timescale stochastic algorithm framework for bilevel optimization.
Bilevel optimization is a class of problems which exhibits a two-level structure, and its goal is …

被引用次数：311 相关文章所有 5 个版本

[PDF] github.io

On finite-time convergence of actor-critic algorithm

S Qiu, Z Yang, J Ye, Z Wang - IEEE Journal on Selected Areas …, 2021 - ieeexplore.ieee.org

Actor-critic algorithm and their extensions have made great achievements in real-world
decision-making problems. In contrast to its empirical success, the theoretical understanding …

被引用次数：88 相关文章所有 2 个版本

[PDF] arxiv.org

A review of tracking, prediction and decision making methods for autonomous driving

F Leon, M Gavrilescu - arXiv preprint arXiv:1909.07707, 2019 - arxiv.org

This literature review focuses on three important aspects of an autonomous car system:
tracking (assessing the identity of the actors such as cars, pedestrians or obstacles in a …

被引用次数：39 相关文章所有 3 个版本

[PDF] neurips.cc

Finite-time performance bounds and adaptive learning rate selection for two time-scale reinforcement learning

H Gupta, R Srikant, L Ying - Advances in Neural …, 2019 - proceedings.neurips.cc

We study two time-scale linear stochastic approximation algorithms, which can be used to
model well-known reinforcement learning algorithms such as GTD, GTD2, and TDC. We …

被引用次数：104 相关文章所有 11 个版本

[PDF] neurips.cc

Taming communication and sample complexities in decentralized policy evaluation for cooperative multi-agent reinforcement learning

X Zhang, Z Liu, J Liu, Z Zhu… - Advances in Neural …, 2021 - proceedings.neurips.cc

Cooperative multi-agent reinforcement learning (MARL) has received increasing attention in
recent years and has found many scientific and engineering applications. However, a key …

被引用次数：31 相关文章所有 8 个版本

[PDF] neurips.cc

A block coordinate ascent algorithm for mean-variance optimization

T Xie, B Liu, Y Xu, M Ghavamzadeh… - Advances in …, 2018 - proceedings.neurips.cc

Risk management in dynamic decision problems is a primary concern in many fields,
including financial investment, autonomous driving, and healthcare. The mean-variance …

被引用次数：40 相关文章所有 10 个版本

[PDF] mlr.press

Modified retrace for off-policy temporal difference learning

X Chen, X Ma, Y Li, G Yang… - Uncertainty in Artificial …, 2023 - proceedings.mlr.press

Off-policy learning is a key to extend reinforcement learning as it allows to learn a target
policy from a different behavior policy that generates the data. However, it is well known as …

被引用次数：3 相关文章所有 5 个版本

[PDF] neurips.cc

Continual auxiliary task learning

M McLeod, C Lo, M Schlegel… - Advances in …, 2021 - proceedings.neurips.cc

Learning auxiliary tasks, such as multiple predictions about the world, can provide many
benefits to reinforcement learning systems. A variety of off-policy learning algorithms have …

被引用次数：9 相关文章所有 8 个版本

[PDF] researchgate.net

Exploring reinforcement learning techniques in the realm of mobile robotics

Z Haider, MZ Sardar, AT Azar… - … of Automation and …, 2024 - inderscienceonline.com

Mobile robots are intelligent machines that can move and perform tasks in different
environments. The key factor enabling the autonomy of mobile robots lies in the reliability …

被引用次数：1 相关文章所有 3 个版本

高级搜索

QQ 群