Multimodal image synthesis and editing: A survey and taxonomy

F Zhan, Y Yu, R Wu, J Zhang, S Lu, L Liu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …

Photorealistic text-to-image diffusion models with deep language understanding

C Saharia, W Chan, S Saxena, L Li… - Advances in neural …, 2022 - proceedings.neurips.cc
We present Imagen, a text-to-image diffusion model with an unprecedented degree of
photorealism and a deep level of language understanding. Imagen builds on the power of …

Inversion-based style transfer with diffusion models

Y Zhang, N Huang, F Tang, H Huang… - Proceedings of the …, 2023 - openaccess.thecvf.com
The artistic style within a painting is the means of expression, which includes not only the
painting material, colors, and brushstrokes, but also the high-level attributes, including …

Zero-shot contrastive loss for text-guided diffusion image style transfer

S Yang, H Hwang, JC Ye - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Diffusion models have shown great promise in text-guided image style transfer, but there is a
trade-off between style transformation and content preservation due to their stochastic …

Guiding instruction-based image editing via multimodal large language models

TJ Fu, W Hu, X Du, WY Wang, Y Yang… - arXiv preprint arXiv …, 2023 - arxiv.org
Instruction-based image editing improves the controllability and flexibility of image
manipulation via natural commands without elaborate descriptions or regional masks …

Tell me what happened: Unifying text-guided video completion via multimodal masked video generation

TJ Fu, L Yu, N Zhang, CY Fu, JC Su… - Proceedings of the …, 2023 - openaccess.thecvf.com
Generating a video given the first several static frames is challenging as it anticipates
reasonable future frames with temporal coherence. Besides video prediction, the ability to …

Diffstyler: Controllable dual diffusion for text-driven image stylization

N Huang, Y Zhang, F Tang, C Ma… - … on Neural Networks …, 2024 - ieeexplore.ieee.org
Despite the impressive results of arbitrary image-guided style transfer methods, text-driven
image stylization has recently been proposed for transferring a natural image into a stylized …

Controlstyle: Text-driven stylized image generation using diffusion priors

J Chen, Y Pan, T Yao, T Mei - Proceedings of the 31st ACM International …, 2023 - dl.acm.org
Recently, the multimedia community has witnessed the rise of diffusion models trained on
large-scale multi-modal data for visual content creation, particularly in the field of text-to …

Learning-based Artificial Intelligence Artwork: Methodology Taxonomy and Quality Evaluation

Q Wang, HN Dai, J Yang, C Guo, P Childs… - ACM Computing …, 2024 - dl.acm.org
With the development of the theory and technology of computer science, machine or
computer painting is increasingly being explored in the creation of art. Machine-made works …

Affective image filter: Reflecting emotions from text to images

S Weng, P Zhang, Z Chang, X Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Understanding the emotions in text and presenting them visually is a very challenging
problem that requires a deep understanding of natural language and high-quality image …