Y Ma, X Zhang, X Sun, J Ji, H Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Text-driven 3D stylization is a complex and crucial task in the fields of computer vision (CV) and computer graphics (CG), aimed at transforming a bare mesh to fit a target text. Prior …
As one of the core video semantic understanding tasks, Video Semantic Role Labeling (VidSRL) aims to detect the salient events from given videos, by recognizing the predict …
S Liu, Y Ma, X Zhang, H Wang, J Ji… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Referring Remote Sensing Image Segmentation (RRSIS) is a new challenge that combines computer vision and natural language processing. Traditional Referring Image …
In recent years, 3D representation learning has turned to 2D vision-language pre-trained models to overcome data scarcity challenges. However, existing methods simply transfer 2D …
Despite considerable progress, the advancement of Panoptic Narrative Grounding (PNG) remains hindered by costly annotations. In this paper, we introduce a novel Semi …
Text-based person retrieval (TPR) is a challenging task that involves retrieving a specific individual based on a textual description. Despite considerable efforts to bridge the gap …
This paper proposes Panoptic Narrative Grounding, a spatially fine and general formulation of the natural language visual grounding problem. We establish an experimental framework …
T Guo, H Wang, Y Ma, J Ji, X Sun - … of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org
Recent advancements in single-stage Panoptic Narrative Grounding (PNG) have demonstrated significant potential. These methods predict pixel-level masks by directly …
Panoptic narrative grounding (PNG), whose core target is fine-grained image-text alignment, requires a panoptic segmentation of referred objects given a narrative caption. Previous …