Ella: Equip diffusion models with llm for enhanced semantic alignment

X Hu, R Wang, Y Fang, B Fu, P Cheng, G Yu - arXiv preprint arXiv …, 2024 - arxiv.org
Diffusion models have demonstrated remarkable performance in the domain of text-to-image
generation. However, most widely used models still employ CLIP as their text encoder …

ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization

L Eyring, S Karthik, K Roth, A Dosovitskiy… - arXiv preprint arXiv …, 2024 - arxiv.org
Text-to-Image (T2I) models have made significant advancements in recent years, but they
still struggle to accurately capture intricate details specified in complex compositional …

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

D Jiang, G Song, X Wu, R Zhang, D Shen… - arXiv preprint arXiv …, 2024 - arxiv.org
Diffusion models have demonstrated great success in the field of text-to-image generation.
However, alleviating the misalignment between the text prompts and images is still …

GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing

Z Wang, A Li, Z Li, X Liu - arXiv preprint arXiv:2407.05600, 2024 - arxiv.org
Despite the success achieved by existing image generation and editing methods, current
models still struggle with complex problems including intricate text prompts, and the …

BACON: Supercharge Your VLM with Bag-of-Concept Graph to Mitigate Hallucinations

Z Yang, R Feng, K Yan, H Wang, Z Wang, S Zhu… - arXiv preprint arXiv …, 2024 - arxiv.org
This paper presents Bag-of-Concept Graph (BACON) to gift models with limited linguistic
abilities to taste the privilege of Vision Language Models (VLMs) and boost downstream …

Position: Do Not Explain Vision Models Without Context

P Tomaszewska, P Biecek - Forty-first International Conference on Machine … - openreview.net
Does the stethoscope in the picture make the adjacent person a doctor or a patient? This, of
course, depends on the contextual relationship of the two objects. If it's obvious, why don't …