A practical guide to multi-objective reinforcement learning and planning

SK Jagatheesaperumal, QV Pham… - IEEE Open Journal …, 2022 - ieeexplore.ieee.org

Explainable Artificial Intelligence (XAI) is transforming the field of Artificial Intelligence (AI) by
enhancing the trust of end-users in machines. As the number of connected devices keeps on …

被引用次数：78 相关文章所有 3 个版本

[PDF] wiley.com Full View

A state‐of‐the‐art review of optimal reservoir control for managing conflicting demands in a changing world

M Giuliani, JR Lamontagne, PM Reed… - Water Resources …, 2021 - Wiley Online Library

The state of the art for optimal water reservoir operations is rapidly evolving, driven by
emerging societal challenges. Changing values for balancing environmental resources …

被引用次数：89 相关文章所有 9 个版本

[PDF] neurips.cc

Rewarded soups: towards pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards

A Rame, G Couairon, C Dancette… - Advances in …, 2024 - proceedings.neurips.cc

Foundation models are first pre-trained on vast unsupervised datasets and then fine-tuned
on labeled data. Reinforcement learning, notably from human feedback (RLHF), can further …

被引用次数：89 相关文章所有 7 个版本

[PDF] mlr.press

Multi-objective gflownets

M Jain, SC Raparthy… - International …, 2023 - proceedings.mlr.press

We study the problem of generating diverse candidates in the context of Multi-Objective
Optimization. In many applications of machine learning such as drug discovery and material …

被引用次数：65 相关文章所有 7 个版本

[PDF] arxiv.org

Personalized soups: Personalized large language model alignment via post-hoc parameter merging

J Jang, S Kim, BY Lin, Y Wang, J Hessel… - arXiv preprint arXiv …, 2023 - arxiv.org

While Reinforcement Learning from Human Feedback (RLHF) aligns Large Language
Models (LLMs) with general, aggregate human preferences, it is suboptimal for learning …

被引用次数：78 相关文章所有 2 个版本

[PDF] springer.com

Scalar reward is not enough: A response to silver, singh, precup and sutton (2021)

P Vamplew, BJ Smith, J Källström, G Ramos… - Autonomous Agents and …, 2022 - Springer

The recent paper “Reward is Enough” by Silver, Singh, Precup and Sutton posits that the
concept of reward maximisation is sufficient to underpin all intelligence, both natural and …

被引用次数：83 相关文章所有 17 个版本

[PDF] cell.com

Goals, usefulness and abstraction in value-based choice

B De Martino, A Cortese - Trends in Cognitive Sciences, 2023 - cell.com

Abstract Colombian drug lord Pablo Escobar, while on the run, purportedly burned two
million dollars in banknotes to keep his daughter warm. A stark reminder that, in life …

被引用次数：38 相关文章所有 7 个版本

[PDF] mdpi.com

A review of deep reinforcement learning approaches for smart manufacturing in industry 4.0 and 5.0 framework

A del Real Torres, DS Andreiana, Á Ojeda Roldán… - Applied Sciences, 2022 - mdpi.com

In this review, the industry's current issues regarding intelligent manufacture are presented.
This work presents the status and the potential for the I4. 0 and I5. 0's revolutionary …

被引用次数：49 相关文章所有 5 个版本

[PDF] mlr.press

Optimistic linear support and successor features as a basis for optimal policy transfer

LN Alegre, A Bazzan… - … conference on machine …, 2022 - proceedings.mlr.press

In many real-world applications, reinforcement learning (RL) agents might have to solve
multiple tasks, each one typically modeled via a reward function. If reward functions are …

被引用次数：41 相关文章所有 7 个版本

[PDF] thecvf.com

Promptable behaviors: Personalizing multi-objective rewards from human preferences

M Hwang, L Weihs, C Park, K Lee… - Proceedings of the …, 2024 - openaccess.thecvf.com

Customizing robotic behaviors to be aligned with diverse human preferences is an
underexplored challenge in the field of embodied AI. In this paper we present Promptable …

被引用次数：9 相关文章所有 4 个版本

高级搜索

QQ 群