No" zero-shot" without exponential data: Pretraining concept frequency determines multimodal model performance

V Udandarao, A Prabhu, A Ghosh… - The Thirty-eighth …, 2024 - openreview.net
Web-crawled pretraining datasets underlie the impressive" zero-shot" evaluation
performance of multimodal models, such as CLIP for classification and Stable-Diffusion for …

Comparison of Image Generation Models for Abstract and Concrete Event Descriptions

M Khaliq, D Frassinelli… - Proceedings of the 4th …, 2024 - aclanthology.org
With the advent of diffusion-based image generation models such as DALL-E, Stable
Diffusion and Midjourney, high quality images can be easily generated using textual inputs …