Dueling rl: Reinforcement learning with trajectory preferences

A Saha, A Pacchiano, J Lee - International Conference on …, 2023 - proceedings.mlr.press
We consider the problem of preference-based reinforcement learning (PbRL), where, unlike
traditional reinforcement learning (RL), an agent receives feedback only in terms of 1 bit …

Fast active learning for pure exploration in reinforcement learning

P Ménard, OD Domingues, A Jonsson… - International …, 2021 - proceedings.mlr.press
Realistic environments often provide agents with very limited feedback. When the
environment is initially unknown, the feedback, in the beginning, can be completely absent …

Efficient model-based reinforcement learning through optimistic policy search and planning

S Curi, F Berkenkamp, A Krause - Advances in Neural …, 2020 - proceedings.neurips.cc
Abstract Model-based reinforcement learning algorithms with probabilistic dynamical
models are amongst the most data-efficient learning methods. This is often attributed to their …

Dueling rl: reinforcement learning with trajectory preferences

A Pacchiano, A Saha, J Lee - arXiv preprint arXiv:2111.04850, 2021 - arxiv.org
We consider the problem of preference based reinforcement learning (PbRL), where, unlike
traditional reinforcement learning, an agent receives feedback only in terms of a 1 bit (0/1) …

Advances in preference-based reinforcement learning: A review

Y Abdelkareem, S Shehata… - 2022 IEEE international …, 2022 - ieeexplore.ieee.org
Reinforcement Learning (RL) algorithms suffer from the dependency on accurately
engineered reward functions to properly guide the learning agents to do the required tasks …

Fast rates for maximum entropy exploration

D Tiapkin, D Belomestny… - International …, 2023 - proceedings.mlr.press
We address the challenge of exploration in reinforcement learning (RL) when the agent
operates in an unknown environment with sparse or no rewards. In this work, we study the …

Ucb momentum q-learning: Correcting the bias without forgetting

P Ménard, OD Domingues, X Shang… - … on Machine Learning, 2021 - proceedings.mlr.press
Abstract We propose UCBMQ, Upper Confidence Bound Momentum Q-learning, a new
algorithm for reinforcement learning in tabular and possibly stage-dependent, episodic …

Reinforcement learning with function approximation: From linear to nonlinear

J Long, J Han - arXiv preprint arXiv:2302.09703, 2023 - arxiv.org
Function approximation has been an indispensable component in modern reinforcement
learning algorithms designed to tackle problems with large state spaces in high dimensions …

Adaptive discretization in online reinforcement learning

SR Sinclair, S Banerjee, CL Yu - Operations Research, 2023 - pubsonline.informs.org
Discretization-based approaches to solving online reinforcement learning problems are
studied extensively on applications such as resource allocation and cache management …

Optimistic posterior sampling for reinforcement learning with few samples and tight guarantees

D Tiapkin, D Belomestny… - Advances in …, 2022 - proceedings.neurips.cc
We consider reinforcement learning in an environment modeled by an episodic, tabular,
step-dependent Markov decision process of horizon $ H $ with $ S $ states, and $ A …