A complete survey on generative ai (aigc): Is chatgpt from gpt-4 to gpt-5 all you need?

C Zhang, C Zhang, S Zheng, Y Qiao, C Li… - arXiv preprint arXiv …, 2023 - arxiv.org
As ChatGPT goes viral, generative AI (AIGC, aka AI-generated content) has made headlines
everywhere because of its ability to analyze and create text, images, and beyond. With such …

State of the art on diffusion models for visual computing

R Po, W Yifan, V Golyanik, K Aberman… - Computer Graphics …, 2024 - Wiley Online Library
The field of visual computing is rapidly advancing due to the emergence of generative
artificial intelligence (AI), which unlocks unprecedented capabilities for the generation …

Blip-diffusion: Pre-trained subject representation for controllable text-to-image generation and editing

D Li, J Li, S Hoi - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Subject-driven text-to-image generation models create novel renditions of an input subject
based on text prompts. Existing models suffer from lengthy fine-tuning and difficulties …

Svdiff: Compact parameter space for diffusion fine-tuning

L Han, Y Li, H Zhang, P Milanfar… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recently, diffusion models have achieved remarkable success in text-to-image generation,
enabling the creation of high-quality images from text prompts and various conditions …

Text-to-image diffusion models in generative ai: A survey

C Zhang, C Zhang, M Zhang, IS Kweon - arXiv preprint arXiv:2303.07909, 2023 - arxiv.org
This survey reviews text-to-image diffusion models in the context that diffusion models have
emerged to be popular for a wide range of generative tasks. As a self-contained work, this …

Video-p2p: Video editing with cross-attention control

S Liu, Y Zhang, W Li, Z Lin, J Jia - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Video-P2P is the first framework for real-world video editing with cross-attention control.
While attention control has proven effective for image editing with pre-trained image …

Instructdiffusion: A generalist modeling interface for vision tasks

Z Geng, B Yang, T Hang, C Li, S Gu… - Proceedings of the …, 2024 - openaccess.thecvf.com
We present InstructDiffusion a unified and generic framework for aligning computer vision
tasks with human instructions. Unlike existing approaches that integrate prior knowledge …

Composer: Creative and controllable image synthesis with composable conditions

L Huang, D Chen, Y Liu, Y Shen, D Zhao… - arXiv preprint arXiv …, 2023 - arxiv.org
Recent large-scale generative models learned on big data are capable of synthesizing
incredible images yet suffer from limited controllability. This work offers a new generation …

Hive: Harnessing human feedback for instructional visual editing

S Zhang, X Yang, Y Feng, C Qin… - Proceedings of the …, 2024 - openaccess.thecvf.com
Incorporating human feedback has been shown to be crucial to align text generated by large
language models to human preferences. We hypothesize that state-of-the-art instructional …

Tf-icon: Diffusion-based training-free cross-domain image composition

S Lu, Y Liu, AWK Kong - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Text-driven diffusion models have exhibited impressive generative capabilities, enabling
various image editing tasks. In this paper, we propose TF-ICON, a novel Training-Free …