Scaling up gans for text-to-image synthesis

F Zhan, Y Yu, R Wu, J Zhang, S Lu, L Liu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org

As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …

被引用次数：176 相关文章所有 11 个版本

[PDF] arxiv.org

On the opportunities and challenges of foundation models for geospatial artificial intelligence

G Mai, W Huang, J Sun, S Song, D Mishra… - arXiv preprint arXiv …, 2023 - arxiv.org

Large pre-trained models, also known as foundation models (FMs), are trained in a task-
agnostic manner on large-scale data and can be adapted to a wide range of downstream …

被引用次数：97 相关文章所有 2 个版本

[PDF] mlr.press

Stylegan-t: Unlocking the power of gans for fast large-scale text-to-image synthesis

A Sauer, T Karras, S Laine… - … on machine learning, 2023 - proceedings.mlr.press

Text-to-image synthesis has recently seen significant progress thanks to large pretrained
language models, large-scale training data, and the introduction of scalable model families …

被引用次数：142 相关文章所有 8 个版本

[PDF] thecvf.com

Instantbooth: Personalized text-to-image generation without test-time finetuning

J Shi, W Xiong, Z Lin, HJ Jung - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Recent advances in personalized image generation have enabled pre-trained text-to-image
models to learn new concepts from specific image sets. However these methods often …

被引用次数：123 相关文章所有 3 个版本

[PDF] thecvf.com

Ablating concepts in text-to-image diffusion models

N Kumari, B Zhang, SY Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Large-scale text-to-image diffusion models can generate high-fidelity images with powerful
compositional ability. However, these models are typically trained on an enormous amount …

被引用次数：85 相关文章所有 6 个版本

[PDF] nowpublishers.com

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com

Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

被引用次数：98 相关文章所有 6 个版本

[PDF] neurips.cc

Stablerep: Synthetic images from text-to-image models make strong visual representation learners

Y Tian, L Fan, P Isola, H Chang… - Advances in Neural …, 2024 - proceedings.neurips.cc

We investigate the potential of learning visual representations using synthetic images
generated by text-to-image models. This is a natural question in the light of the excellent …

被引用次数：64 相关文章所有 5 个版本

[PDF] neurips.cc

Holistic evaluation of text-to-image models

T Lee, M Yasunaga, C Meng, Y Mai… - Advances in …, 2024 - proceedings.neurips.cc

The stunning qualitative improvement of text-to-image models has led to their widespread
attention and adoption. However, we lack a comprehensive quantitative understanding of …

被引用次数：42 相关文章所有 6 个版本

[PDF] arxiv.org

Adversarial diffusion distillation

A Sauer, D Lorenz, A Blattmann… - arXiv preprint arXiv …, 2023 - arxiv.org

We introduce Adversarial Diffusion Distillation (ADD), a novel training approach that
efficiently samples large-scale foundational image diffusion models in just 1-4 steps while …

被引用次数：81 相关文章所有 2 个版本

[PDF] arxiv.org

PixArt-: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

J Chen, J Yu, C Ge, L Yao, E Xie, Y Wu, Z Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

The most advanced text-to-image (T2I) models require significant training costs (eg, millions
of GPU hours), seriously hindering the fundamental innovation for the AIGC community …

被引用次数：95 相关文章所有 3 个版本

高级搜索

QQ 群