Optimizing ddpm sampling with shortcut fine-tuning

Y Fan, O Watkins, Y Du, H Liu, M Ryu… - Advances in …, 2024 - proceedings.neurips.cc

Learning from human feedback has been shown to improve text-to-image models. These
techniques first learn a reward function that captures what humans care about in the task …

被引用次数：178 相关文章所有 7 个版本

[PDF] arxiv.org

A survey on generative diffusion models

H Cao, C Tan, Z Gao, Y Xu, G Chen… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Deep generative models have unlocked another profound realm of human creativity. By
capturing and generalizing patterns within data, we have entered the epoch of all …

被引用次数：418 相关文章所有 5 个版本

[PDF] arxiv.org

Training diffusion models with reinforcement learning

K Black, M Janner, Y Du, I Kostrikov… - arXiv preprint arXiv …, 2023 - arxiv.org

Diffusion models are a class of flexible generative models trained with an approximation to
the log-likelihood objective. However, most use cases of diffusion models are not concerned …

被引用次数：227 相关文章所有 6 个版本

[PDF] arxiv.org

Reinforcement learning for generative ai: A survey

Y Cao, QZ Sheng, J McAuley, L Yao - arXiv preprint arXiv:2308.14328, 2023 - arxiv.org

Deep Generative AI has been a long-standing essential topic in the machine learning
community, which can impact a number of application areas like text generation and …

被引用次数：17 相关文章所有 2 个版本

[PDF] arxiv.org

Directly fine-tuning diffusion models on differentiable rewards

K Clark, P Vicol, K Swersky, DJ Fleet - arXiv preprint arXiv:2309.17400, 2023 - arxiv.org

We present Direct Reward Fine-Tuning (DRaFT), a simple and effective method for fine-
tuning diffusion models to maximize differentiable reward functions, such as scores from …

被引用次数：103 相关文章所有 4 个版本

[PDF] arxiv.org

A comprehensive survey on knowledge distillation of diffusion models

W Luo - arXiv preprint arXiv:2304.04262, 2023 - arxiv.org

Diffusion Models (DMs), also referred to as score-based diffusion models, utilize neural
networks to specify score functions. Unlike most other probabilistic models, DMs directly …

被引用次数：32 相关文章所有 2 个版本

[PDF] thecvf.com

Using human feedback to fine-tune diffusion models without any reward model

K Yang, J Tao, J Lyu, C Ge, J Chen… - Proceedings of the …, 2024 - openaccess.thecvf.com

Using reinforcement learning with human feedback (RLHF) has shown significant promise in
fine-tuning diffusion models. Previous methods start by training a reward model that aligns …

被引用次数：41 相关文章所有 3 个版本

[PDF] arxiv.org

Parrot: Pareto-optimal multi-reward reinforcement learning framework for text-to-image generation

SH Lee, Y Li, J Ke, I Yoo, H Zhang, J Yu… - … on Computer Vision, 2024 - Springer

Recent works have demonstrated that using reinforcement learning (RL) with multiple
quality rewards can improve the quality of generated images in text-to-image (T2I) …

被引用次数：15 相关文章所有 2 个版本

[PDF] neurips.cc

Beta diffusion

M Zhou, T Chen, Z Wang… - Advances in Neural …, 2024 - proceedings.neurips.cc

We introduce beta diffusion, a novel generative modeling method that integrates demasking
and denoising to generate data within bounded ranges. Using scaled and shifted beta …

被引用次数：9 相关文章所有 6 个版本

[PDF] arxiv.org

Diffusion policy policy optimization

AZ Ren, J Lidard, LL Ankile, A Simeonov… - arXiv preprint arXiv …, 2024 - arxiv.org

We introduce Diffusion Policy Policy Optimization, DPPO, an algorithmic framework
including best practices for fine-tuning diffusion-based policies (eg Diffusion Policy) in …

被引用次数：15 相关文章所有 3 个版本

高级搜索

QQ 群