Towards real-time panoptic narrative grounding by an end-to-end grounding network

L Qu, S Wu, H Fei, L Nie, TS Chua - Proceedings of the 31st ACM …, 2023 - dl.acm.org

In the text-to-image generation field, recent remarkable progress in Stable Diffusion makes it
possible to generate rich kinds of novel photorealistic images. However, current models still …

被引用次数：94 相关文章所有 3 个版本

[PDF] thecvf.com

X-mesh: Towards fast and accurate text-driven 3d stylization via dynamic textual guidance

Y Ma, X Zhang, X Sun, J Ji, H Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Text-driven 3D stylization is a complex and crucial task in the fields of computer vision (CV)
and computer graphics (CG), aimed at transforming a bare mesh to fit a target text. Prior …

被引用次数：39 相关文章所有 5 个版本

[PDF] acm.org

Constructing holistic spatio-temporal scene graph for video semantic role labeling

Y Zhao, H Fei, Y Cao, B Li, M Zhang, J Wei… - Proceedings of the 31st …, 2023 - dl.acm.org

As one of the core video semantic understanding tasks, Video Semantic Role Labeling
(VidSRL) aims to detect the salient events from given videos, by recognizing the predict …

被引用次数：39 相关文章所有 5 个版本

[PDF] thecvf.com

Rotated multi-scale interaction network for referring remote sensing image segmentation

S Liu, Y Ma, X Zhang, H Wang, J Ji… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Referring Remote Sensing Image Segmentation (RRSIS) is a new challenge that
combines computer vision and natural language processing. Traditional Referring Image …

被引用次数：32 相关文章所有 3 个版本

[PDF] arxiv.org

Beyond first impressions: Integrating joint multi-modal cues for comprehensive 3d representation

H Wang, J Tang, J Ji, X Sun, R Zhang, Y Ma… - Proceedings of the 31st …, 2023 - dl.acm.org

In recent years, 3D representation learning has turned to 2D vision-language pre-trained
models to overcome data scarcity challenges. However, existing methods simply transfer 2D …

被引用次数：12 相关文章所有 3 个版本

[PDF] arxiv.org

Semi-supervised panoptic narrative grounding

D Yang, J Ji, X Sun, H Wang, Y Li, Y Ma… - Proceedings of the 31st …, 2023 - dl.acm.org

Despite considerable progress, the advancement of Panoptic Narrative Grounding (PNG)
remains hindered by costly annotations. In this paper, we introduce a novel Semi …

被引用次数：8 相关文章所有 3 个版本

[PDF] arxiv.org

Beat: Bi-directional one-to-many embedding alignment for text-based person retrieval

Y Ma, X Sun, J Ji, G Jiang, W Zhuang, R Ji - Proceedings of the 31st …, 2023 - dl.acm.org

Text-based person retrieval (TPR) is a challenging task that involves retrieving a specific
individual based on a textual description. Despite considerable efforts to bridge the gap …

被引用次数：15 相关文章所有 3 个版本

Piglet: Pixel-level grounding of language expressions with transformers

C González, N Ayobi, I Hernández… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org

This paper proposes Panoptic Narrative Grounding, a spatially fine and general formulation
of the natural language visual grounding problem. We establish an experimental framework …

被引用次数：8 相关文章所有 5 个版本

[PDF] aaai.org

Improving Panoptic Narrative Grounding by Harnessing Semantic Relationships and Visual Confirmation

T Guo, H Wang, Y Ma, J Ji, X Sun - … of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org

Recent advancements in single-stage Panoptic Narrative Grounding (PNG) have
demonstrated significant potential. These methods predict pixel-level masks by directly …

被引用次数：3 相关文章

[PDF] arxiv.org

Dynamic prompting of frozen text-to-image diffusion models for panoptic narrative grounding

H Li, T Hui, Z Ding, J Zhang, B Ma, X Wei… - Proceedings of the …, 2024 - dl.acm.org

Panoptic narrative grounding (PNG), whose core target is fine-grained image-text alignment,
requires a panoptic segmentation of referred objects given a narrative caption. Previous …

被引用次数：2 相关文章所有 4 个版本

高级搜索

QQ 群