Clipscore: A reference-free evaluation metric for image captioning

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

被引用次数：317 相关文章所有 11 个版本

[PDF] arxiv.org

An empirical survey on long document summarization: Datasets, models, and metrics

HY Koh, J Ju, M Liu, S Pan - ACM computing surveys, 2022 - dl.acm.org

Long documents such as academic articles and business reports have been the standard
format to detail out important issues and complicated subjects that require extra attention. An …

被引用次数：90 相关文章所有 7 个版本

[PDF] thecvf.com

Imagic: Text-based real image editing with diffusion models

B Kawar, S Zada, O Lang, O Tov… - Proceedings of the …, 2023 - openaccess.thecvf.com

Text-conditioned image editing has recently attracted considerable interest. However, most
methods are currently limited to one of the following: specific editing types (eg, object …

被引用次数：744 相关文章所有 6 个版本

[PDF] arxiv.org

Imagen video: High definition video generation with diffusion models

J Ho, W Chan, C Saharia, J Whang, R Gao… - arXiv preprint arXiv …, 2022 - arxiv.org

We present Imagen Video, a text-conditional video generation system based on a cascade
of video diffusion models. Given a text prompt, Imagen Video generates high definition …

被引用次数：1008 相关文章所有 4 个版本

[PDF] thecvf.com

Multi-concept customization of text-to-image diffusion

N Kumari, B Zhang, R Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

While generative models produce high-quality images of concepts learned from a large-
scale database, a user often wishes to synthesize instantiations of their own concepts (for …

被引用次数：480 相关文章所有 5 个版本

[PDF] thecvf.com

Scaling up gans for text-to-image synthesis

M Kang, JY Zhu, R Zhang, J Park… - Proceedings of the …, 2023 - openaccess.thecvf.com

The recent success of text-to-image synthesis has taken the world by storm and captured the
general public's imagination. From a technical standpoint, it also marked a drastic change in …

被引用次数：342 相关文章所有 6 个版本

[PDF] thecvf.com

Text2video-zero: Text-to-image diffusion models are zero-shot video generators

L Khachatryan, A Movsisyan… - Proceedings of the …, 2023 - openaccess.thecvf.com

Recent text-to-video generation approaches rely on computationally heavy training and
require large-scale video datasets. In this paper, we introduce a new task, zero-shot text-to …

被引用次数：325 相关文章所有 7 个版本

[PDF] acm.org Full View

Zero-shot image-to-image translation

G Parmar, K Kumar Singh, R Zhang, Y Li, J Lu… - ACM SIGGRAPH 2023 …, 2023 - dl.acm.org

Large-scale text-to-image generative models have shown their remarkable ability to
synthesize diverse, high-quality images. However, directly applying these models for real …

被引用次数：292 相关文章所有 3 个版本

[PDF] thecvf.com

Fatezero: Fusing attentions for zero-shot text-based video editing

C Qi, X Cun, Y Zhang, C Lei, X Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com

The diffusion-based generative models have achieved remarkable success in text-based
image generation. However, since it contains enormous randomness in generation …

被引用次数：202 相关文章所有 6 个版本

[PDF] mlr.press

Stylegan-t: Unlocking the power of gans for fast large-scale text-to-image synthesis

A Sauer, T Karras, S Laine… - … on machine learning, 2023 - proceedings.mlr.press

Text-to-image synthesis has recently seen significant progress thanks to large pretrained
language models, large-scale training data, and the introduction of scalable model families …

被引用次数：168 相关文章所有 8 个版本

高级搜索

QQ 群