- 学术资源搜索

Multimodal image synthesis and editing: A survey and taxonomy

F Zhan, Y Yu, R Wu, J Zhang, S Lu, L Liu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org

As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …

被引用次数：263 相关文章所有 11 个版本

[PDF] neurips.cc

Photorealistic text-to-image diffusion models with deep language understanding

C Saharia, W Chan, S Saxena, L Li… - Advances in neural …, 2022 - proceedings.neurips.cc

We present Imagen, a text-to-image diffusion model with an unprecedented degree of
photorealism and a deep level of language understanding. Imagen builds on the power of …

被引用次数：5494 相关文章所有 11 个版本

[PDF] thecvf.com

Inversion-based style transfer with diffusion models

Y Zhang, N Huang, F Tang, H Huang… - Proceedings of the …, 2023 - openaccess.thecvf.com

The artistic style within a painting is the means of expression, which includes not only the
painting material, colors, and brushstrokes, but also the high-level attributes, including …

被引用次数：246 相关文章所有 6 个版本

[PDF] thecvf.com

Zero-shot contrastive loss for text-guided diffusion image style transfer

S Yang, H Hwang, JC Ye - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

Diffusion models have shown great promise in text-guided image style transfer, but there is a
trade-off between style transformation and content preservation due to their stochastic …

被引用次数：58 相关文章所有 6 个版本

[PDF] arxiv.org

Guiding instruction-based image editing via multimodal large language models

TJ Fu, W Hu, X Du, WY Wang, Y Yang… - arXiv preprint arXiv …, 2023 - arxiv.org

Instruction-based image editing improves the controllability and flexibility of image
manipulation via natural commands without elaborate descriptions or regional masks …

被引用次数：75 相关文章所有 5 个版本

[PDF] thecvf.com

Tell me what happened: Unifying text-guided video completion via multimodal masked video generation

TJ Fu, L Yu, N Zhang, CY Fu, JC Su… - Proceedings of the …, 2023 - openaccess.thecvf.com

Generating a video given the first several static frames is challenging as it anticipates
reasonable future frames with temporal coherence. Besides video prediction, the ability to …

被引用次数：39 相关文章所有 8 个版本

[PDF] arxiv.org

Diffstyler: Controllable dual diffusion for text-driven image stylization

N Huang, Y Zhang, F Tang, C Ma… - … on Neural Networks …, 2024 - ieeexplore.ieee.org

Despite the impressive results of arbitrary image-guided style transfer methods, text-driven
image stylization has recently been proposed for transferring a natural image into a stylized …

被引用次数：42 相关文章所有 5 个版本

[PDF] arxiv.org

Controlstyle: Text-driven stylized image generation using diffusion priors

J Chen, Y Pan, T Yao, T Mei - Proceedings of the 31st ACM International …, 2023 - dl.acm.org

Recently, the multimedia community has witnessed the rise of diffusion models trained on
large-scale multi-modal data for visual content creation, particularly in the field of text-to …

被引用次数：28 相关文章所有 4 个版本

[PDF] acm.org

Learning-based Artificial Intelligence Artwork: Methodology Taxonomy and Quality Evaluation

Q Wang, HN Dai, J Yang, C Guo, P Childs… - ACM Computing …, 2024 - dl.acm.org

With the development of the theory and technology of computer science, machine or
computer painting is increasingly being explored in the creation of art. Machine-made works …

Affective image filter: Reflecting emotions from text to images

S Weng, P Zhang, Z Chang, X Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Understanding the emotions in text and presenting them visually is a very challenging
problem that requires a deep understanding of natural language and high-quality image …

被引用次数：10 相关文章所有 4 个版本

高级搜索

QQ 群

Multimodal image synthesis and editing: A survey and taxonomy

Photorealistic text-to-image diffusion models with deep language understanding

Inversion-based style transfer with diffusion models

Zero-shot contrastive loss for text-guided diffusion image style transfer

Guiding instruction-based image editing via multimodal large language models

Tell me what happened: Unifying text-guided video completion via multimodal masked video generation

Diffstyler: Controllable dual diffusion for text-driven image stylization

Controlstyle: Text-driven stylized image generation using diffusion priors

Learning-based Artificial Intelligence Artwork: Methodology Taxonomy and Quality Evaluation

Affective image filter: Reflecting emotions from text to images

引用