Diffusion models: A comprehensive survey of methods and applications

L Yang, Z Zhang, Y Song, S Hong, R Xu, Y Zhao… - ACM Computing …, 2023 - dl.acm.org
Diffusion models have emerged as a powerful new family of deep generative models with
record-breaking performance in many applications, including image synthesis, video …

Synthetic data from diffusion models improves imagenet classification

S Azizi, S Kornblith, C Saharia, M Norouzi… - arXiv preprint arXiv …, 2023 - arxiv.org
Deep generative models are becoming increasingly powerful, now generating diverse high
fidelity photo-realistic samples given text prompts. Have they reached the point where …

Improving multimodal datasets with image captioning

T Nguyen, SY Gadre, G Ilharco… - Advances in Neural …, 2024 - proceedings.neurips.cc
Massive web datasets play a key role in the success of large vision-language models like
CLIP and Flamingo. However, the raw web data is noisy, and existing filtering methods to …

Dream the impossible: Outlier imagination with diffusion models

X Du, Y Sun, J Zhu, Y Li - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Utilizing auxiliary outlier datasets to regularize the machine learning model has
demonstrated promise for out-of-distribution (OOD) detection and safe prediction. Due to the …

Diverse data augmentation with diffusions for effective test-time prompt tuning

CM Feng, K Yu, Y Liu, S Khan… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Benefiting from prompt tuning, recent years have witnessed the promising performance of
pre-trained vision-language models, eg, CLIP, on versatile downstream tasks. In this paper …

Self-consuming generative models go mad

S Alemohammad, J Casco-Rodriguez, L Luzi… - arXiv preprint arXiv …, 2023 - arxiv.org
Seismic advances in generative AI algorithms for imagery, text, and other data types has led
to the temptation to use synthetic data to train next-generation models. Repeating this …

Freemask: Synthetic images with dense annotations make stronger segmentation models

L Yang, X Xu, B Kang, Y Shi… - Advances in Neural …, 2024 - proceedings.neurips.cc
Semantic segmentation has witnessed tremendous progress due to the proposal of various
advanced network architectures. However, they are extremely hungry for delicate …

Fine-tuning multimodal llms to follow zero-shot demonstrative instructions

J Li, K Pan, Z Ge, M Gao, W Ji, W Zhang… - The Twelfth …, 2023 - openreview.net
Recent advancements in Multimodal Large Language Models (MLLMs) have been utilizing
Visual Prompt Generators (VPGs) to convert visual features into tokens that LLMs can …

Waffling around for performance: Visual classification with random words and broad concepts

K Roth, JM Kim, A Koepke, O Vinyals… - Proceedings of the …, 2023 - openaccess.thecvf.com
The visual classification performance of vision-language models such as CLIP has been
shown to benefit from additional semantic knowledge from large language models (LLMs) …

Diversity is definitely needed: Improving model-agnostic zero-shot classification via stable diffusion

J Shipard, A Wiliem, KN Thanh… - Proceedings of the …, 2023 - openaccess.thecvf.com
In this work, we investigate the problem of Model-Agnostic Zero-Shot Classification (MA-
ZSC), which refers to training non-specific classification architectures (downstream models) …