Using human feedback to fine-tune diffusion models without any reward model

J Ye, F Liu, Q Li, Z Wang, Y Wang, X Wang… - … on Computer Vision, 2025 - Springer

Abstract 3D content creation from text prompts has shown remarkable success recently.
However, current text-to-3D methods often generate 3D results that do not align well with …

被引用次数：17 相关文章所有 4 个版本

[PDF] arxiv.org

Efficient diffusion models: A comprehensive survey from principles to practices

Z Ma, Y Zhang, G Jia, L Zhao, Y Ma, M Ma… - arXiv preprint arXiv …, 2024 - arxiv.org

As one of the most popular and sought-after generative models in the recent years, diffusion
models have sparked the interests of many researchers and steadily shown excellent …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Direct unlearning optimization for robust and safe text-to-image models

YH Park, S Yun, JH Kim, J Kim, G Jang, Y Jeong… - arXiv preprint arXiv …, 2024 - arxiv.org

Recent advancements in text-to-image (T2I) models have greatly benefited from large-scale
datasets, but they also pose significant risks due to the potential generation of unsafe …

被引用次数：8 相关文章所有 4 个版本

[PDF] arxiv.org

Alignment of diffusion models: Fundamentals, challenges, and future

B Liu, S Shao, B Li, L Bai, Z Xu, H Xiong, J Kwok… - arXiv preprint arXiv …, 2024 - arxiv.org

Diffusion models have emerged as the leading paradigm in generative modeling, excelling
in various applications. Despite their success, these models often misalign with human …

被引用次数：7 相关文章所有 3 个版本

[PDF] thecvf.com

Lodge: A coarse to fine diffusion network for long dance generation guided by the characteristic dance primitives

R Li, YX Zhang, Y Zhang, H Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com

We propose Lodge a network capable of generating extremely long dance sequences
conditioned on given music. We design Lodge as a two-stage coarse to fine diffusion …

被引用次数：6 相关文章所有 4 个版本

[PDF] arxiv.org

Feedback efficient online fine-tuning of diffusion models

M Uehara, Y Zhao, K Black, E Hajiramezanali… - arXiv preprint arXiv …, 2024 - arxiv.org

Diffusion models excel at modeling complex data distributions, including those of images,
proteins, and small molecules. However, in many cases, our goal is to model parts of the …

被引用次数：16 相关文章所有 3 个版本

[PDF] arxiv.org

Itercomp: Iterative composition-aware feedback learning from model gallery for text-to-image generation

X Zhang, L Yang, G Li, Y Cai, J Xie, Y Tang… - arXiv preprint arXiv …, 2024 - arxiv.org

Advanced diffusion models like RPG, Stable Diffusion 3 and FLUX have made notable
strides in compositional text-to-image generation. However, these methods typically exhibit …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Aligning diffusion models by optimizing human utility

S Li, K Kallidromitis, A Gokul, Y Kato… - arXiv preprint arXiv …, 2024 - arxiv.org

We present Diffusion-KTO, a novel approach for aligning text-to-image diffusion models by
formulating the alignment objective as the maximization of expected human utility. Since this …

被引用次数：7 相关文章所有 2 个版本

[PDF] arxiv.org

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

L Chen, Z Wang, S Ren, L Li, H Zhao, Y Li… - arXiv preprint arXiv …, 2024 - arxiv.org

Building on the foundations of language modeling in natural language processing, Next
Token Prediction (NTP) has evolved into a versatile training objective for machine learning …

Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback

H Furuta, H Zen, D Schuurmans, A Faust… - arXiv preprint arXiv …, 2024 - arxiv.org

Large text-to-video models hold immense potential for a wide range of downstream
applications. However, these models struggle to accurately depict dynamic object …

被引用次数：1 相关文章所有 2 个版本

高级搜索

QQ 群