Evolving rewards to automate reinforcement learning

YJ Ma, W Liang, G Wang, DA Huang, O Bastani… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) have excelled as high-level semantic planners for
sequential decision-making tasks. However, harnessing them to learn complex low-level …

被引用次数：134 相关文章所有 7 个版本

[PDF] science.org

Evolutionary reinforcement learning: A survey

H Bai, R Cheng, Y Jin - Intelligent Computing, 2023 - spj.science.org

Reinforcement learning (RL) is a machine learning approach that trains agents to maximize
cumulative rewards through interactions with environments. The integration of RL with deep …

被引用次数：29 相关文章所有 4 个版本

[PDF] aaai.org

The perils of trial-and-error reward design: misdesign through overfitting and invalid task specifications

S Booth, WB Knox, J Shah, S Niekum, P Stone… - Proceedings of the …, 2023 - ojs.aaai.org

In reinforcement learning (RL), a reward function that aligns exactly with a task's true
performance metric is often necessarily sparse. For example, a true task metric might …

被引用次数：38 相关文章所有 9 个版本

[PDF] thecvf.com

Auto mc-reward: Automated dense reward design with large language models for minecraft

H Li, X Yang, Z Wang, X Zhu, J Zhou… - Proceedings of the …, 2024 - openaccess.thecvf.com

Many reinforcement learning environments (eg Minecraft) provide only sparse rewards that
indicate task completion or failure with binary values. The challenge in exploration efficiency …

被引用次数：8 相关文章所有 3 个版本

[PDF] arxiv.org

Preference transformer: Modeling human preferences using transformers for rl

C Kim, J Park, J Shin, H Lee, P Abbeel… - arXiv preprint arXiv …, 2023 - arxiv.org

Preference-based reinforcement learning (RL) provides a framework to train agents using
human preferences between two behaviors. However, preference-based RL has been …

被引用次数：44 相关文章所有 4 个版本

[PDF] arxiv.org

Evolving reinforcement learning algorithms

JD Co-Reyes, Y Miao, D Peng, E Real, S Levine… - arXiv preprint arXiv …, 2021 - arxiv.org

We propose a method for meta-learning reinforcement learning algorithms by searching
over the space of computational graphs which compute the loss function for a value-based …

被引用次数：86 相关文章所有 11 个版本

[PDF] arxiv.org

Automated reinforcement learning: An overview

RR Afshar, Y Zhang, J Vanschoren… - arXiv preprint arXiv …, 2022 - arxiv.org

Reinforcement Learning and recently Deep Reinforcement Learning are popular methods
for solving sequential decision making problems modeled as Markov Decision Processes …

被引用次数：16 相关文章所有 2 个版本

[PDF] neurips.cc

Sequential preference ranking for efficient reinforcement learning from human feedback

M Hwang, G Lee, H Kee, CW Kim… - Advances in Neural …, 2024 - proceedings.neurips.cc

Reinforcement learning from human feedback (RLHF) alleviates the problem of designing a
task-specific reward function in reinforcement learning by learning it from human preference …

被引用次数：5 相关文章所有 5 个版本

[PDF] neurips.cc

TTOpt: A maximum volume quantized tensor train-based optimization and its application to reinforcement learning

K Sozykin, A Chertkov, R Schutski… - Advances in …, 2022 - proceedings.neurips.cc

We present a novel procedure for optimization based on the combination of efficient
quantized tensor train representation and a generalized maximum matrix volume principle …

被引用次数：26 相关文章所有 8 个版本

[PDF] mlr.press

Open source vizier: Distributed infrastructure and api for reliable and flexible blackbox optimization

X Song, S Perel, C Lee, G Kochanski… - International …, 2022 - proceedings.mlr.press

Vizier is the de-facto blackbox optimization service across Google, having optimized some of
Google's largest products and research efforts. To operate at the scale of tuning thousands …

被引用次数：26 相关文章所有 5 个版本

高级搜索

QQ 群