Layoutllm-t2i: Eliciting layout guidance from llm for text-to-image generation

L Qu, S Wu, H Fei, L Nie, TS Chua - Proceedings of the 31st ACM …, 2023 - dl.acm.org
In the text-to-image generation field, recent remarkable progress in Stable Diffusion makes it
possible to generate rich kinds of novel photorealistic images. However, current models still …

X-mesh: Towards fast and accurate text-driven 3d stylization via dynamic textual guidance

Y Ma, X Zhang, X Sun, J Ji, H Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Text-driven 3D stylization is a complex and crucial task in the fields of computer vision (CV)
and computer graphics (CG), aimed at transforming a bare mesh to fit a target text. Prior …

Constructing holistic spatio-temporal scene graph for video semantic role labeling

Y Zhao, H Fei, Y Cao, B Li, M Zhang, J Wei… - Proceedings of the 31st …, 2023 - dl.acm.org
As one of the core video semantic understanding tasks, Video Semantic Role Labeling
(VidSRL) aims to detect the salient events from given videos, by recognizing the predict …

Rotated multi-scale interaction network for referring remote sensing image segmentation

S Liu, Y Ma, X Zhang, H Wang, J Ji… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Referring Remote Sensing Image Segmentation (RRSIS) is a new challenge that
combines computer vision and natural language processing. Traditional Referring Image …

Beyond first impressions: Integrating joint multi-modal cues for comprehensive 3d representation

H Wang, J Tang, J Ji, X Sun, R Zhang, Y Ma… - Proceedings of the 31st …, 2023 - dl.acm.org
In recent years, 3D representation learning has turned to 2D vision-language pre-trained
models to overcome data scarcity challenges. However, existing methods simply transfer 2D …

Semi-supervised panoptic narrative grounding

D Yang, J Ji, X Sun, H Wang, Y Li, Y Ma… - Proceedings of the 31st …, 2023 - dl.acm.org
Despite considerable progress, the advancement of Panoptic Narrative Grounding (PNG)
remains hindered by costly annotations. In this paper, we introduce a novel Semi …

Beat: Bi-directional one-to-many embedding alignment for text-based person retrieval

Y Ma, X Sun, J Ji, G Jiang, W Zhuang, R Ji - Proceedings of the 31st …, 2023 - dl.acm.org
Text-based person retrieval (TPR) is a challenging task that involves retrieving a specific
individual based on a textual description. Despite considerable efforts to bridge the gap …

Piglet: Pixel-level grounding of language expressions with transformers

C González, N Ayobi, I Hernández… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
This paper proposes Panoptic Narrative Grounding, a spatially fine and general formulation
of the natural language visual grounding problem. We establish an experimental framework …

Improving Panoptic Narrative Grounding by Harnessing Semantic Relationships and Visual Confirmation

T Guo, H Wang, Y Ma, J Ji, X Sun - … of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org
Recent advancements in single-stage Panoptic Narrative Grounding (PNG) have
demonstrated significant potential. These methods predict pixel-level masks by directly …

Dynamic prompting of frozen text-to-image diffusion models for panoptic narrative grounding

H Li, T Hui, Z Ding, J Zhang, B Ma, X Wei… - Proceedings of the …, 2024 - dl.acm.org
Panoptic narrative grounding (PNG), whose core target is fine-grained image-text alignment,
requires a panoptic segmentation of referred objects given a narrative caption. Previous …