Mirrorgan: Learning text-to-image generation by redescription

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022 - nowpublishers.com

This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …

被引用次数：148 相关文章所有 7 个版本

[PDF] arxiv.org

Multimodal image synthesis and editing: A survey and taxonomy

F Zhan, Y Yu, R Wu, J Zhang, S Lu, L Liu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org

As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …

被引用次数：197 相关文章所有 11 个版本

[PDF] thecvf.com

Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation

N Ruiz, Y Li, V Jampani, Y Pritch… - Proceedings of the …, 2023 - openaccess.thecvf.com

Large text-to-image models achieved a remarkable leap in the evolution of AI, enabling high-
quality and diverse synthesis of images from a given text prompt. However, these models …

被引用次数：1633 相关文章所有 6 个版本

[PDF] thecvf.com

Text2video-zero: Text-to-image diffusion models are zero-shot video generators

L Khachatryan, A Movsisyan… - Proceedings of the …, 2023 - openaccess.thecvf.com

Recent text-to-video generation approaches rely on computationally heavy training and
require large-scale video datasets. In this paper, we introduce a new task, zero-shot text-to …

被引用次数：301 相关文章所有 7 个版本

[PDF] arxiv.org

Prompt-to-prompt image editing with cross attention control

A Hertz, R Mokady, J Tenenbaum, K Aberman… - arXiv preprint arXiv …, 2022 - arxiv.org

Recent large-scale text-driven synthesis models have attracted much attention thanks to
their remarkable capabilities of generating highly diverse images that follow given text …

被引用次数：1020 相关文章所有 3 个版本

[PDF] thecvf.com

Dream3d: Zero-shot text-to-3d synthesis using 3d shape prior and text-to-image diffusion models

J Xu, X Wang, W Cheng, YP Cao… - Proceedings of the …, 2023 - openaccess.thecvf.com

Recent CLIP-guided 3D optimization methods, such as DreamFields and PureCLIPNeRF,
have achieved impressive results in zero-shot text-to-3D synthesis. However, due to scratch …

被引用次数：136 相关文章所有 5 个版本

[PDF] thecvf.com

Blended diffusion for text-driven editing of natural images

O Avrahami, D Lischinski… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

Natural language offers a highly intuitive interface for image editing. In this paper, we
introduce the first solution for performing local (region-based) edits in generic natural …

被引用次数：698 相关文章所有 6 个版本

[PDF] thecvf.com

Vector quantized diffusion model for text-to-image synthesis

S Gu, D Chen, J Bao, F Wen, B Zhang… - Proceedings of the …, 2022 - openaccess.thecvf.com

We present the vector quantized diffusion (VQ-Diffusion) model for text-to-image generation.
This method is based on a vector quantized variational autoencoder (VQ-VAE) whose latent …

被引用次数：647 相关文章所有 10 个版本

[PDF] thecvf.com

Galip: Generative adversarial clips for text-to-image synthesis

M Tao, BK Bao, H Tang, C Xu - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Synthesizing high-fidelity complex images from text is challenging. Based on large
pretraining, the autoregressive and diffusion models can synthesize photo-realistic images …

被引用次数：77 相关文章所有 6 个版本

[PDF] arxiv.org

Empowering things with intelligence: a survey of the progress, challenges, and opportunities in artificial intelligence of things

J Zhang, D Tao - IEEE Internet of Things Journal, 2020 - ieeexplore.ieee.org

In the Internet-of-Things (IoT) era, billions of sensors and devices collect and process data
from the environment, transmit them to cloud centers, and receive feedback via the Internet …

被引用次数：511 相关文章所有 4 个版本

高级搜索

QQ 群