Learning cross-modal context graph for visual grounding

Y Zhou, H Zheng, X Huang, S Hao, D Li… - ACM Transactions on …, 2022 - dl.acm.org

Graph neural networks provide a powerful toolkit for embedding real-world graphs into low-
dimensional spaces according to specific tasks. Up to now, there have been several surveys …

被引用次数：98 相关文章所有 4 个版本

[PDF] arxiv.org

A survey on graph neural networks and graph transformers in computer vision: a task-oriented perspective

C Chen, Y Wu, Q Dai, HY Zhou, M Xu, S Yang… - arXiv preprint arXiv …, 2022 - arxiv.org

Graph Neural Networks (GNNs) have gained momentum in graph representation learning
and boosted the state of the art in a variety of areas, such as data mining (\emph {eg,} social …

被引用次数：43 相关文章所有 3 个版本

[PDF] github.io

Graph neural networks: foundation, frontiers and applications

L Wu, P Cui, J Pei, L Zhao, X Guo - … of the 28th ACM SIGKDD Conference …, 2022 - dl.acm.org

The field of graph neural networks (GNNs) has seen rapid and incredible strides over the
recent years. Graph neural networks, also known as deep learning on graphs, graph …

被引用次数：304 相关文章所有 11 个版本

[PDF] thecvf.com

What does clip know about a red circle? visual prompt engineering for vlms

A Shtedritski, C Rupprecht… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Abstract Large-scale Vision-Language Models, such as CLIP, learn powerful image-text
representations that have found numerous applications, from zero-shot classification to text …

被引用次数：57 相关文章所有 7 个版本

[PDF] thecvf.com

Eda: Explicit text-decoupling and dense alignment for 3d visual grounding

Y Wu, X Cheng, R Zhang, Z Cheng… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract 3D visual grounding aims to find the object within point clouds mentioned by free-
form natural language descriptions with rich semantic cues. However, existing methods …

被引用次数：44 相关文章所有 5 个版本

[PDF] mlr.press

Languagerefer: Spatial-language model for 3d visual grounding

J Roh, K Desingh, A Farhadi… - Conference on Robot …, 2022 - proceedings.mlr.press

For robots to understand human instructions and perform meaningful tasks in the near
future, it is important to develop learned models that comprehend referential language to …

被引用次数：73 相关文章所有 6 个版本

[PDF] thecvf.com

Shifting more attention to visual backbone: Query-modulated refinement networks for end-to-end visual grounding

J Ye, J Tian, M Yan, X Yang, X Wang… - Proceedings of the …, 2022 - openaccess.thecvf.com

Visual grounding focuses on establishing fine-grained alignment between vision and natural
language, which has essential applications in multimodal reasoning systems. Existing …

被引用次数：41 相关文章所有 5 个版本

[PDF] thecvf.com

Multi-modal relational graph for cross-modal video moment retrieval

Y Zeng, D Cao, X Wei, M Liu… - Proceedings of the …, 2021 - openaccess.thecvf.com

Given an untrimmed video and a query sentence, cross-modal video moment retrieval aims
to rank a video moment from pre-segmented video moment candidates that best matches …

被引用次数：71 相关文章所有 4 个版本

[PDF] neurips.cc

Look around and refer: 2d synthetic semantics knowledge distillation for 3d visual grounding

E Bakr, Y Alsaedy, M Elhoseiny - Advances in neural …, 2022 - proceedings.neurips.cc

Abstract 3D visual grounding task has been explored with visual and language streams to
comprehend referential language for identifying targeted objects in 3D scenes. However …

被引用次数：23 相关文章所有 9 个版本

[PDF] thecvf.com

Free-form description guided 3d visual graph network for object grounding in point cloud

M Feng, Z Li, Q Li, L Zhang, XD Zhang… - Proceedings of the …, 2021 - openaccess.thecvf.com

Abstract 3D object grounding aims to locate the most relevant target object in a raw point
cloud scene based on a free-form language description. Understanding complex and …

被引用次数：63 相关文章所有 7 个版本

高级搜索

QQ 群