Reinforcement learning for fine-tuning text-to-image diffusion models

Y Fan, O Watkins, Y Du, H Liu, M Ryu… - Advances in …, 2024 - proceedings.neurips.cc
Learning from human feedback has been shown to improve text-to-image models. These
techniques first learn a reward function that captures what humans care about in the task …

A survey on generative diffusion models

H Cao, C Tan, Z Gao, Y Xu, G Chen… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Deep generative models have unlocked another profound realm of human creativity. By
capturing and generalizing patterns within data, we have entered the epoch of all …

Training diffusion models with reinforcement learning

K Black, M Janner, Y Du, I Kostrikov… - arXiv preprint arXiv …, 2023 - arxiv.org
Diffusion models are a class of flexible generative models trained with an approximation to
the log-likelihood objective. However, most use cases of diffusion models are not concerned …

Reinforcement learning for generative ai: A survey

Y Cao, QZ Sheng, J McAuley, L Yao - arXiv preprint arXiv:2308.14328, 2023 - arxiv.org
Deep Generative AI has been a long-standing essential topic in the machine learning
community, which can impact a number of application areas like text generation and …

Directly fine-tuning diffusion models on differentiable rewards

K Clark, P Vicol, K Swersky, DJ Fleet - arXiv preprint arXiv:2309.17400, 2023 - arxiv.org
We present Direct Reward Fine-Tuning (DRaFT), a simple and effective method for fine-
tuning diffusion models to maximize differentiable reward functions, such as scores from …

A comprehensive survey on knowledge distillation of diffusion models

W Luo - arXiv preprint arXiv:2304.04262, 2023 - arxiv.org
Diffusion Models (DMs), also referred to as score-based diffusion models, utilize neural
networks to specify score functions. Unlike most other probabilistic models, DMs directly …

Using human feedback to fine-tune diffusion models without any reward model

K Yang, J Tao, J Lyu, C Ge, J Chen… - Proceedings of the …, 2024 - openaccess.thecvf.com
Using reinforcement learning with human feedback (RLHF) has shown significant promise in
fine-tuning diffusion models. Previous methods start by training a reward model that aligns …

Parrot: Pareto-optimal multi-reward reinforcement learning framework for text-to-image generation

SH Lee, Y Li, J Ke, I Yoo, H Zhang, J Yu… - … on Computer Vision, 2024 - Springer
Recent works have demonstrated that using reinforcement learning (RL) with multiple
quality rewards can improve the quality of generated images in text-to-image (T2I) …

Beta diffusion

M Zhou, T Chen, Z Wang… - Advances in Neural …, 2024 - proceedings.neurips.cc
We introduce beta diffusion, a novel generative modeling method that integrates demasking
and denoising to generate data within bounded ranges. Using scaled and shifted beta …

Diffusion policy policy optimization

AZ Ren, J Lidard, LL Ankile, A Simeonov… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce Diffusion Policy Policy Optimization, DPPO, an algorithmic framework
including best practices for fine-tuning diffusion-based policies (eg Diffusion Policy) in …