- 学术资源搜索

Dueling rl: Reinforcement learning with trajectory preferences

A Saha, A Pacchiano, J Lee - International Conference on …, 2023 - proceedings.mlr.press

We consider the problem of preference-based reinforcement learning (PbRL), where, unlike
traditional reinforcement learning (RL), an agent receives feedback only in terms of 1 bit …

被引用次数：34 相关文章

[PDF] mlr.press

Fast active learning for pure exploration in reinforcement learning

P Ménard, OD Domingues, A Jonsson… - International …, 2021 - proceedings.mlr.press

Realistic environments often provide agents with very limited feedback. When the
environment is initially unknown, the feedback, in the beginning, can be completely absent …

被引用次数：88 相关文章所有 7 个版本

[PDF] neurips.cc

Efficient model-based reinforcement learning through optimistic policy search and planning

S Curi, F Berkenkamp, A Krause - Advances in Neural …, 2020 - proceedings.neurips.cc

Abstract Model-based reinforcement learning algorithms with probabilistic dynamical
models are amongst the most data-efficient learning methods. This is often attributed to their …

被引用次数：106 相关文章所有 7 个版本

[PDF] arxiv.org

Dueling rl: reinforcement learning with trajectory preferences

A Pacchiano, A Saha, J Lee - arXiv preprint arXiv:2111.04850, 2021 - arxiv.org

We consider the problem of preference based reinforcement learning (PbRL), where, unlike
traditional reinforcement learning, an agent receives feedback only in terms of a 1 bit (0/1) …

被引用次数：51 相关文章所有 2 个版本

[PDF] arxiv.org

Advances in preference-based reinforcement learning: A review

Y Abdelkareem, S Shehata… - 2022 IEEE international …, 2022 - ieeexplore.ieee.org

Reinforcement Learning (RL) algorithms suffer from the dependency on accurately
engineered reward functions to properly guide the learning agents to do the required tasks …

被引用次数：11 相关文章所有 2 个版本

[PDF] mlr.press

Fast rates for maximum entropy exploration

D Tiapkin, D Belomestny… - International …, 2023 - proceedings.mlr.press

We address the challenge of exploration in reinforcement learning (RL) when the agent
operates in an unknown environment with sparse or no rewards. In this work, we study the …

被引用次数：14 相关文章所有 9 个版本

[PDF] mlr.press

Ucb momentum q-learning: Correcting the bias without forgetting

P Ménard, OD Domingues, X Shang… - … on Machine Learning, 2021 - proceedings.mlr.press

Abstract We propose UCBMQ, Upper Confidence Bound Momentum Q-learning, a new
algorithm for reinforcement learning in tabular and possibly stage-dependent, episodic …

被引用次数：49 相关文章所有 13 个版本

[PDF] arxiv.org

Reinforcement learning with function approximation: From linear to nonlinear

J Long, J Han - arXiv preprint arXiv:2302.09703, 2023 - arxiv.org

Function approximation has been an indispensable component in modern reinforcement
learning algorithms designed to tackle problems with large state spaces in high dimensions …

被引用次数：4 相关文章所有 3 个版本

[PDF] nsf.gov

Adaptive discretization in online reinforcement learning

SR Sinclair, S Banerjee, CL Yu - Operations Research, 2023 - pubsonline.informs.org

Discretization-based approaches to solving online reinforcement learning problems are
studied extensively on applications such as resource allocation and cache management …

被引用次数：22 相关文章所有 6 个版本

[PDF] neurips.cc

Optimistic posterior sampling for reinforcement learning with few samples and tight guarantees

D Tiapkin, D Belomestny… - Advances in …, 2022 - proceedings.neurips.cc

We consider reinforcement learning in an environment modeled by an episodic, tabular,
step-dependent Markov decision process of horizon $ H $ with $ S $ states, and $ A …

被引用次数：9 相关文章所有 10 个版本

高级搜索

QQ 群

Dueling rl: Reinforcement learning with trajectory preferences

Fast active learning for pure exploration in reinforcement learning

Efficient model-based reinforcement learning through optimistic policy search and planning

Dueling rl: reinforcement learning with trajectory preferences

Advances in preference-based reinforcement learning: A review

Fast rates for maximum entropy exploration

Ucb momentum q-learning: Correcting the bias without forgetting

Reinforcement learning with function approximation: From linear to nonlinear

Adaptive discretization in online reinforcement learning

Optimistic posterior sampling for reinforcement learning with few samples and tight guarantees

引用