Dreamreward: Text-to-3d generation with human preference

J Ye, F Liu, Q Li, Z Wang, Y Wang, X Wang… - … on Computer Vision, 2025 - Springer
Abstract 3D content creation from text prompts has shown remarkable success recently.
However, current text-to-3D methods often generate 3D results that do not align well with …

Efficient diffusion models: A comprehensive survey from principles to practices

Z Ma, Y Zhang, G Jia, L Zhao, Y Ma, M Ma… - arXiv preprint arXiv …, 2024 - arxiv.org
As one of the most popular and sought-after generative models in the recent years, diffusion
models have sparked the interests of many researchers and steadily shown excellent …

Direct unlearning optimization for robust and safe text-to-image models

YH Park, S Yun, JH Kim, J Kim, G Jang, Y Jeong… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advancements in text-to-image (T2I) models have greatly benefited from large-scale
datasets, but they also pose significant risks due to the potential generation of unsafe …

Alignment of diffusion models: Fundamentals, challenges, and future

B Liu, S Shao, B Li, L Bai, Z Xu, H Xiong, J Kwok… - arXiv preprint arXiv …, 2024 - arxiv.org
Diffusion models have emerged as the leading paradigm in generative modeling, excelling
in various applications. Despite their success, these models often misalign with human …

Lodge: A coarse to fine diffusion network for long dance generation guided by the characteristic dance primitives

R Li, YX Zhang, Y Zhang, H Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com
We propose Lodge a network capable of generating extremely long dance sequences
conditioned on given music. We design Lodge as a two-stage coarse to fine diffusion …

Feedback efficient online fine-tuning of diffusion models

M Uehara, Y Zhao, K Black, E Hajiramezanali… - arXiv preprint arXiv …, 2024 - arxiv.org
Diffusion models excel at modeling complex data distributions, including those of images,
proteins, and small molecules. However, in many cases, our goal is to model parts of the …

Itercomp: Iterative composition-aware feedback learning from model gallery for text-to-image generation

X Zhang, L Yang, G Li, Y Cai, J Xie, Y Tang… - arXiv preprint arXiv …, 2024 - arxiv.org
Advanced diffusion models like RPG, Stable Diffusion 3 and FLUX have made notable
strides in compositional text-to-image generation. However, these methods typically exhibit …

Aligning diffusion models by optimizing human utility

S Li, K Kallidromitis, A Gokul, Y Kato… - arXiv preprint arXiv …, 2024 - arxiv.org
We present Diffusion-KTO, a novel approach for aligning text-to-image diffusion models by
formulating the alignment objective as the maximization of expected human utility. Since this …

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

L Chen, Z Wang, S Ren, L Li, H Zhao, Y Li… - arXiv preprint arXiv …, 2024 - arxiv.org
Building on the foundations of language modeling in natural language processing, Next
Token Prediction (NTP) has evolved into a versatile training objective for machine learning …

Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback

H Furuta, H Zen, D Schuurmans, A Faust… - arXiv preprint arXiv …, 2024 - arxiv.org
Large text-to-video models hold immense potential for a wide range of downstream
applications. However, these models struggle to accurately depict dynamic object …