Ld-znet: A latent diffusion approach for text-based image segmentation

M Xu, W Yin, D Cai, R Yi, D Xu, Q Wang, B Wu… - arXiv preprint arXiv …, 2024 - arxiv.org

Large foundation models, including large language models (LLMs), vision transformers
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …

被引用次数：29 相关文章所有 3 个版本

[PDF] thecvf.com

Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models

P Marcos-Manchón, R Alcover-Couso… - Proceedings of the …, 2024 - openaccess.thecvf.com

Diffusion models represent a new paradigm in text-to-image generation. Beyond generating
high-quality images from text prompts models such as Stable Diffusion have been …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Lime: Localized image editing via attention regularization in diffusion models

E Simsar, A Tonioni, Y Xian, T Hofmann… - arXiv preprint arXiv …, 2023 - arxiv.org

Diffusion models (DMs) have gained prominence due to their ability to generate high-quality,
varied images, with recent advancements in text-to-image generation. The research focus is …

被引用次数：2 相关文章所有 2 个版本

Stable diffusion exposed: Gender bias from prompt to image

Y Wu, Y Nakashima, N Garcia - arXiv preprint arXiv:2312.03027, 2023 - arxiv.org

Recent studies have highlighted biases in generative models, shedding light on their
predisposition towards gender-based stereotypes and imbalances. This paper contributes to …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

LocInv: Localization-aware Inversion for Text-Guided Image Editing

C Tang, K Wang, F Yang, J van de Weijer - arXiv preprint arXiv …, 2024 - arxiv.org

Large-scale Text-to-Image (T2I) diffusion models demonstrate significant generation
capabilities based on textual prompts. Based on the T2I diffusion models, text-guided image …

相关文章所有 2 个版本

[PDF] arxiv.org

Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model

D Yang, R Dong, J Ji, Y Ma, H Wang, X Sun… - arXiv preprint arXiv …, 2024 - arxiv.org

Recently, diffusion models have increasingly demonstrated their capabilities in vision
understanding. By leveraging prompt-based learning to construct sentences, these models …

相关文章所有 2 个版本

[PDF] arxiv.org

Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation

Z Zhu, X Feng, D Chen, J Yuan, C Qiao… - arXiv preprint arXiv …, 2024 - arxiv.org

In this paper, we explore the visual representations produced from a pre-trained text-to-
video (T2V) diffusion model for video understanding tasks. We hypothesize that the latent …

相关文章所有 2 个版本

[PDF] arxiv.org

Do text-free diffusion models learn discriminative visual representations?

S Mukhopadhyay, M Gwilliam, Y Yamaguchi… - arXiv preprint arXiv …, 2023 - arxiv.org

While many unsupervised learning models focus on one family of tasks, either generative or
discriminative, we explore the possibility of a unified representation learner: a model which …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

InVi: Object Insertion In Videos Using Off-the-Shelf Diffusion Models

N Saini, N Bodla, A Shrivastava… - arXiv preprint arXiv …, 2024 - arxiv.org

We introduce InVi, an approach for inserting or replacing objects within videos (referred to
as inpainting) using off-the-shelf, text-to-image latent diffusion models. InVi targets controlled …

相关文章所有 2 个版本

[PDF] arxiv.org

GazeHTA: End-to-end Gaze Target Detection with Head-Target Association

ZY Lin, JY Chew, J van Gemert, X Zhang - arXiv preprint arXiv:2404.10718, 2024 - arxiv.org

We propose an end-to-end approach for gaze target detection: predicting a head-target
connection between individuals and the target image regions they are looking at. Most of the …

相关文章所有 2 个版本

高级搜索

QQ 群