Fast high-resolution image synthesis with latent adversarial diffusion distillation

A Sauer, F Boesel, T Dockhorn, A Blattmann… - arXiv preprint arXiv …, 2024 - arxiv.org
Diffusion models are the main driver of progress in image and video synthesis, but suffer
from slow inference speed. Distillation methods, like the recently introduced adversarial …

Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers

S Koley, AK Bhunia, A Sain… - Proceedings of the …, 2024 - openaccess.thecvf.com
This paper for the first time explores text-to-image diffusion models for Zero-Shot Sketch-
based Image Retrieval (ZS-SBIR). We highlight a pivotal discovery: the capacity of text-to …

Getting aligned on representational alignment

I Sucholutsky, L Muttenthaler, A Weller, A Peng… - arXiv preprint arXiv …, 2023 - arxiv.org
Biological and artificial information processing systems form representations of the world
that they can use to categorize, reason, plan, navigate, and make decisions. To what extent …

Visual anagrams: Generating multi-view optical illusions with diffusion models

D Geng, I Park, A Owens - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
We address the problem of synthesizing multi-view optical illusions: images that change
appearance upon a transformation such as a flip or rotation. We propose a simple zero-shot …

An introduction to vision-language modeling

F Bordes, RY Pang, A Ajay, AC Li, A Bardes… - arXiv preprint arXiv …, 2024 - arxiv.org
Following the recent popularity of Large Language Models (LLMs), several attempts have
been made to extend them to the visual domain. From having a visual assistant that could …

Investigating the Effectiveness of Cross-Attention to Unlock Zero-Shot Editing of Text-to-Video Diffusion Models

S Motamed, W Van Gansbeke… - Proceedings of the …, 2024 - openaccess.thecvf.com
With recent advances in image and video diffusion models for content creation a plethora of
techniques have been proposed for customizing their generated content. In particular …

Can Biases in ImageNet Models Explain Generalization?

P Gavrikov, J Keuper - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
The robust generalization of models to rare in-distribution (ID) samples drawn from the long
tail of the training distribution and to out-of-training-distribution (OOD) samples is one of the …

Factorized diffusion: Perceptual illusions by noise decomposition

D Geng, I Park, A Owens - arXiv preprint arXiv:2404.11615, 2024 - arxiv.org
Given a factorization of an image into a sum of linear components, we present a zero-shot
method to control each individual component through diffusion model sampling. For …

RadEdit: stress-testing biomedical vision models via diffusion image editing

F Pérez-García, S Bond-Taylor, PP Sanchez… - arXiv preprint arXiv …, 2023 - arxiv.org
Biomedical imaging datasets are often small and biased, meaning that real-world
performance of predictive models can be substantially lower than expected from internal …

Layerwise complexity-matched learning yields an improved model of cortical area V2

N Parthasarathy, OJ Hénaff, EP Simoncelli - arXiv preprint arXiv …, 2023 - arxiv.org
Human ability to recognize complex visual patterns arises through transformations
performed by successive areas in the ventral visual cortex. Deep neural networks trained …