Vision-language pre-training: Basics, recent advances, and future trends

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022 - nowpublishers.com
This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …

Multimodal image synthesis and editing: A survey and taxonomy

F Zhan, Y Yu, R Wu, J Zhang, S Lu, L Liu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …

Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation

N Ruiz, Y Li, V Jampani, Y Pritch… - Proceedings of the …, 2023 - openaccess.thecvf.com
Large text-to-image models achieved a remarkable leap in the evolution of AI, enabling high-
quality and diverse synthesis of images from a given text prompt. However, these models …

Text2video-zero: Text-to-image diffusion models are zero-shot video generators

L Khachatryan, A Movsisyan… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent text-to-video generation approaches rely on computationally heavy training and
require large-scale video datasets. In this paper, we introduce a new task, zero-shot text-to …

Prompt-to-prompt image editing with cross attention control

A Hertz, R Mokady, J Tenenbaum, K Aberman… - arXiv preprint arXiv …, 2022 - arxiv.org
Recent large-scale text-driven synthesis models have attracted much attention thanks to
their remarkable capabilities of generating highly diverse images that follow given text …

Dream3d: Zero-shot text-to-3d synthesis using 3d shape prior and text-to-image diffusion models

J Xu, X Wang, W Cheng, YP Cao… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent CLIP-guided 3D optimization methods, such as DreamFields and PureCLIPNeRF,
have achieved impressive results in zero-shot text-to-3D synthesis. However, due to scratch …

Blended diffusion for text-driven editing of natural images

O Avrahami, D Lischinski… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Natural language offers a highly intuitive interface for image editing. In this paper, we
introduce the first solution for performing local (region-based) edits in generic natural …

Vector quantized diffusion model for text-to-image synthesis

S Gu, D Chen, J Bao, F Wen, B Zhang… - Proceedings of the …, 2022 - openaccess.thecvf.com
We present the vector quantized diffusion (VQ-Diffusion) model for text-to-image generation.
This method is based on a vector quantized variational autoencoder (VQ-VAE) whose latent …

Galip: Generative adversarial clips for text-to-image synthesis

M Tao, BK Bao, H Tang, C Xu - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Synthesizing high-fidelity complex images from text is challenging. Based on large
pretraining, the autoregressive and diffusion models can synthesize photo-realistic images …

Empowering things with intelligence: a survey of the progress, challenges, and opportunities in artificial intelligence of things

J Zhang, D Tao - IEEE Internet of Things Journal, 2020 - ieeexplore.ieee.org
In the Internet-of-Things (IoT) era, billions of sensors and devices collect and process data
from the environment, transmit them to cloud centers, and receive feedback via the Internet …