Exploration in deep reinforcement learning: A survey

P Ladosz, L Weng, M Kim, H Oh - Information Fusion, 2022 - Elsevier
This paper reviews exploration techniques in deep reinforcement learning. Exploration
techniques are of primary importance when solving sparse reward problems. In sparse …

Recent advances in reinforcement learning in finance

B Hambly, R Xu, H Yang - Mathematical Finance, 2023 - Wiley Online Library
The rapid changes in the finance industry due to the increasing amount of data have
revolutionized the techniques on data processing and data analysis and brought new …

Exploration-exploitation in constrained mdps

Y Efroni, S Mannor, M Pirotta - arXiv preprint arXiv:2003.02189, 2020 - arxiv.org
In many sequential decision-making problems, the goal is to optimize a utility function while
satisfying a set of constraints on different utilities. This learning problem is formalized …

Safe reinforcement learning with linear function approximation

S Amani, C Thrampoulidis… - … Conference on Machine …, 2021 - proceedings.mlr.press
Safety in reinforcement learning has become increasingly important in recent years. Yet,
existing solutions either fail to strictly avoid choosing unsafe actions, which may lead to …

Anytime-competitive reinforcement learning with policy prior

J Yang, P Li, T Li, A Wierman… - Advances in Neural …, 2024 - proceedings.neurips.cc
This paper studies the problem of Anytime-Competitive Markov Decision Process (A-CMDP).
Existing works on Constrained Markov Decision Processes (CMDPs) aim to optimize the …

Adaptive deep reinforcement learning for non-stationary environments

J Zhu, Y Wei, Y Kang, X Jiang, GE Dullerud - Science China Information …, 2022 - Springer
Deep reinforcement learning (DRL) is currently used to solve Markov decision process
problems for which the environment is typically assumed to be stationary. In this paper, we …

Smoothing policies and safe policy gradients

M Papini, M Pirotta, M Restelli - Machine Learning, 2022 - Springer
Policy gradient (PG) algorithms are among the best candidates for the much-anticipated
applications of reinforcement learning to real-world control tasks, such as robotics. However …

Near-optimal conservative exploration in reinforcement learning under episode-wise constraints

D Li, R Huang, C Shen, J Yang - … Conference on Machine …, 2023 - proceedings.mlr.press
This paper investigates conservative exploration in reinforcement learning where the
performance of the learning agent is guaranteed to be above a certain threshold throughout …

Learning for edge-weighted online bipartite matching with robustness guarantees

P Li, J Yang, S Ren - International Conference on Machine …, 2023 - proceedings.mlr.press
Many problems, such as online ad display, can be formulated as online bipartite matching.
The crucial challenge lies in the nature of sequentially-revealed online item information …

A provably efficient sample collection strategy for reinforcement learning

J Tarbouriech, M Pirotta, M Valko… - Advances in Neural …, 2021 - proceedings.neurips.cc
One of the challenges in online reinforcement learning (RL) is that the agent needs to trade
off the exploration of the environment and the exploitation of the samples to optimize its …