Reinforcement learning for generative ai: State of the art, opportunities and open research challenges

G Franceschelli, M Musolesi - Journal of Artificial Intelligence Research, 2024 - jair.org
Abstract Generative Artificial Intelligence (AI) is one of the most exciting developments in
Computer Science of the last decade. At the same time, Reinforcement Learning (RL) has …

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com
Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

Understanding reinforcement learning-based fine-tuning of diffusion models: A tutorial and review

M Uehara, Y Zhao, T Biancalani, S Levine - arXiv preprint arXiv …, 2024 - arxiv.org
This tutorial provides a comprehensive survey of methods for fine-tuning diffusion models to
optimize downstream reward functions. While diffusion models are widely known to provide …

Training diffusion models with reinforcement learning

K Black, M Janner, Y Du, I Kostrikov… - arXiv preprint arXiv …, 2023 - arxiv.org
Diffusion models are a class of flexible generative models trained with an approximation to
the log-likelihood objective. However, most use cases of diffusion models are not concerned …

Diffusion model alignment using direct preference optimization

B Wallace, M Dang, R Rafailov… - Proceedings of the …, 2024 - openaccess.thecvf.com
Large language models (LLMs) are fine-tuned using human comparison data with
Reinforcement Learning from Human Feedback (RLHF) methods to make them better …

Directly fine-tuning diffusion models on differentiable rewards

K Clark, P Vicol, K Swersky, DJ Fleet - arXiv preprint arXiv:2309.17400, 2023 - arxiv.org
We present Direct Reward Fine-Tuning (DRaFT), a simple and effective method for fine-
tuning diffusion models to maximize differentiable reward functions, such as scores from …

Using human feedback to fine-tune diffusion models without any reward model

K Yang, J Tao, J Lyu, C Ge, J Chen… - Proceedings of the …, 2024 - openaccess.thecvf.com
Using reinforcement learning with human feedback (RLHF) has shown significant promise in
fine-tuning diffusion models. Previous methods start by training a reward model that aligns …

From to : Your Language Model is Secretly a Q-Function

R Rafailov, J Hejna, R Park, C Finn - arXiv preprint arXiv:2404.12358, 2024 - arxiv.org
Reinforcement Learning From Human Feedback (RLHF) has been a critical to the success
of the latest generation of generative AI models. In response to the complex nature of the …

Textcraftor: Your text encoder can be image quality controller

Y Li, X Liu, A Kag, J Hu, Y Idelbayev… - Proceedings of the …, 2024 - openaccess.thecvf.com
Diffusion-based text-to-image generative models eg Stable Diffusion have revolutionized the
field of content generation enabling significant advancements in areas like image editing …

InstructVideo: instructing video diffusion models with human feedback

H Yuan, S Zhang, X Wang, Y Wei… - Proceedings of the …, 2024 - openaccess.thecvf.com
Diffusion models have emerged as the de facto paradigm for video generation. However
their reliance on web-scale data of varied quality often yields results that are visually …