Abstract Colombian drug lord Pablo Escobar, while on the run, purportedly burned two million dollars in banknotes to keep his daughter warm. A stark reminder that, in life …
Foundation models are first pre-trained on vast unsupervised datasets and then fine-tuned on labeled data. Reinforcement learning, notably from human feedback (RLHF), can further …
Real-world sequential decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in …
M Jain, SC Raparthy… - International …, 2023 - proceedings.mlr.press
We study the problem of generating diverse candidates in the context of Multi-Objective Optimization. In many applications of machine learning such as drug discovery and material …
X Lin, Z Yang, X Zhang… - Advances in neural …, 2022 - proceedings.neurips.cc
Expensive multi-objective optimization problems can be found in many real-world applications, where their objective function evaluations involve expensive computations or …
Many real-world control problems involve conflicting objectives where we desire a dense and high-quality set of control policies that are optimal for different objective preferences …
While Reinforcement Learning from Human Feedback (RLHF) aligns Large Language Models (LLMs) with general, aggregate human preferences, it is suboptimal for learning …
Exploration is a key problem in reinforcement learning, since agents can only learn from data they acquire in the environment. With that in mind, maintaining a population of agents is …
Y Ge, X Zhao, L Yu, S Paul, D Hu, CC Hsieh… - Proceedings of the …, 2022 - dl.acm.org
The issue of fairness in recommendation is becoming increasingly essential as Recommender Systems (RS) touch and influence more and more people in their daily lives …