Recent advances in reinforcement learning in finance

B Hambly, R Xu, H Yang - Mathematical Finance, 2023 - Wiley Online Library
The rapid changes in the finance industry due to the increasing amount of data have
revolutionized the techniques on data processing and data analysis and brought new …

Goals, usefulness and abstraction in value-based choice

B De Martino, A Cortese - Trends in Cognitive Sciences, 2023 - cell.com
Abstract Colombian drug lord Pablo Escobar, while on the run, purportedly burned two
million dollars in banknotes to keep his daughter warm. A stark reminder that, in life …

Rewarded soups: towards pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards

A Rame, G Couairon, C Dancette… - Advances in …, 2024 - proceedings.neurips.cc
Foundation models are first pre-trained on vast unsupervised datasets and then fine-tuned
on labeled data. Reinforcement learning, notably from human feedback (RLHF), can further …

A practical guide to multi-objective reinforcement learning and planning

CF Hayes, R Rădulescu, E Bargiacchi… - Autonomous Agents and …, 2022 - Springer
Real-world sequential decision-making tasks are generally complex, requiring trade-offs
between multiple, often conflicting, objectives. Despite this, the majority of research in …

Multi-objective gflownets

M Jain, SC Raparthy… - International …, 2023 - proceedings.mlr.press
We study the problem of generating diverse candidates in the context of Multi-Objective
Optimization. In many applications of machine learning such as drug discovery and material …

Pareto set learning for expensive multi-objective optimization

X Lin, Z Yang, X Zhang… - Advances in neural …, 2022 - proceedings.neurips.cc
Expensive multi-objective optimization problems can be found in many real-world
applications, where their objective function evaluations involve expensive computations or …

Prediction-guided multi-objective reinforcement learning for continuous robot control

J Xu, Y Tian, P Ma, D Rus, S Sueda… - … on machine learning, 2020 - proceedings.mlr.press
Many real-world control problems involve conflicting objectives where we desire a dense
and high-quality set of control policies that are optimal for different objective preferences …

Personalized soups: Personalized large language model alignment via post-hoc parameter merging

J Jang, S Kim, BY Lin, Y Wang, J Hessel… - arXiv preprint arXiv …, 2023 - arxiv.org
While Reinforcement Learning from Human Feedback (RLHF) aligns Large Language
Models (LLMs) with general, aggregate human preferences, it is suboptimal for learning …

Effective diversity in population based reinforcement learning

J Parker-Holder, A Pacchiano… - Advances in …, 2020 - proceedings.neurips.cc
Exploration is a key problem in reinforcement learning, since agents can only learn from
data they acquire in the environment. With that in mind, maintaining a population of agents is …

Toward Pareto efficient fairness-utility trade-off in recommendation through reinforcement learning

Y Ge, X Zhao, L Yu, S Paul, D Hu, CC Hsieh… - Proceedings of the …, 2022 - dl.acm.org
The issue of fairness in recommendation is becoming increasingly essential as
Recommender Systems (RS) touch and influence more and more people in their daily lives …