Conservative exploration in reinforcement learning

P Ladosz, L Weng, M Kim, H Oh - Information Fusion, 2022 - Elsevier

This paper reviews exploration techniques in deep reinforcement learning. Exploration
techniques are of primary importance when solving sparse reward problems. In sparse …

被引用次数：366 相关文章所有 5 个版本

[PDF] wiley.com

Recent advances in reinforcement learning in finance

B Hambly, R Xu, H Yang - Mathematical Finance, 2023 - Wiley Online Library

The rapid changes in the finance industry due to the increasing amount of data have
revolutionized the techniques on data processing and data analysis and brought new …

被引用次数：202 相关文章所有 13 个版本

[PDF] arxiv.org

Exploration-exploitation in constrained mdps

Y Efroni, S Mannor, M Pirotta - arXiv preprint arXiv:2003.02189, 2020 - arxiv.org

In many sequential decision-making problems, the goal is to optimize a utility function while
satisfying a set of constraints on different utilities. This learning problem is formalized …

被引用次数：176 相关文章所有 2 个版本

[PDF] mlr.press

Safe reinforcement learning with linear function approximation

S Amani, C Thrampoulidis… - … Conference on Machine …, 2021 - proceedings.mlr.press

Safety in reinforcement learning has become increasingly important in recent years. Yet,
existing solutions either fail to strictly avoid choosing unsafe actions, which may lead to …

被引用次数：44 相关文章所有 5 个版本

[PDF] neurips.cc

Anytime-competitive reinforcement learning with policy prior

J Yang, P Li, T Li, A Wierman… - Advances in Neural …, 2024 - proceedings.neurips.cc

This paper studies the problem of Anytime-Competitive Markov Decision Process (A-CMDP).
Existing works on Constrained Markov Decision Processes (CMDPs) aim to optimize the …

被引用次数：2 相关文章所有 6 个版本

Adaptive deep reinforcement learning for non-stationary environments

J Zhu, Y Wei, Y Kang, X Jiang, GE Dullerud - Science China Information …, 2022 - Springer

Deep reinforcement learning (DRL) is currently used to solve Markov decision process
problems for which the environment is typically assumed to be stationary. In this paper, we …

被引用次数：13 相关文章所有 5 个版本

[PDF] springer.com

Smoothing policies and safe policy gradients

M Papini, M Pirotta, M Restelli - Machine Learning, 2022 - Springer

Policy gradient (PG) algorithms are among the best candidates for the much-anticipated
applications of reinforcement learning to real-world control tasks, such as robotics. However …

被引用次数：43 相关文章所有 11 个版本

[PDF] mlr.press

Near-optimal conservative exploration in reinforcement learning under episode-wise constraints

D Li, R Huang, C Shen, J Yang - … Conference on Machine …, 2023 - proceedings.mlr.press

This paper investigates conservative exploration in reinforcement learning where the
performance of the learning agent is guaranteed to be above a certain threshold throughout …

被引用次数：3 相关文章所有 9 个版本

[PDF] mlr.press

Learning for edge-weighted online bipartite matching with robustness guarantees

P Li, J Yang, S Ren - International Conference on Machine …, 2023 - proceedings.mlr.press

Many problems, such as online ad display, can be formulated as online bipartite matching.
The crucial challenge lies in the nature of sequentially-revealed online item information …

被引用次数：6 相关文章所有 9 个版本

[PDF] neurips.cc

A provably efficient sample collection strategy for reinforcement learning

J Tarbouriech, M Pirotta, M Valko… - Advances in Neural …, 2021 - proceedings.neurips.cc

One of the challenges in online reinforcement learning (RL) is that the agent needs to trade
off the exploration of the environment and the exploitation of the samples to optimize its …

被引用次数：18 相关文章所有 12 个版本

高级搜索

QQ 群