Graph neural networks: Taxonomy, advances, and trends

Y Zhou, H Zheng, X Huang, S Hao, D Li… - ACM Transactions on …, 2022 - dl.acm.org
Graph neural networks provide a powerful toolkit for embedding real-world graphs into low-
dimensional spaces according to specific tasks. Up to now, there have been several surveys …

A survey on graph neural networks and graph transformers in computer vision: a task-oriented perspective

C Chen, Y Wu, Q Dai, HY Zhou, M Xu, S Yang… - arXiv preprint arXiv …, 2022 - arxiv.org
Graph Neural Networks (GNNs) have gained momentum in graph representation learning
and boosted the state of the art in a variety of areas, such as data mining (\emph {eg,} social …

Graph neural networks: foundation, frontiers and applications

L Wu, P Cui, J Pei, L Zhao, X Guo - … of the 28th ACM SIGKDD Conference …, 2022 - dl.acm.org
The field of graph neural networks (GNNs) has seen rapid and incredible strides over the
recent years. Graph neural networks, also known as deep learning on graphs, graph …

What does clip know about a red circle? visual prompt engineering for vlms

A Shtedritski, C Rupprecht… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Abstract Large-scale Vision-Language Models, such as CLIP, learn powerful image-text
representations that have found numerous applications, from zero-shot classification to text …

Eda: Explicit text-decoupling and dense alignment for 3d visual grounding

Y Wu, X Cheng, R Zhang, Z Cheng… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract 3D visual grounding aims to find the object within point clouds mentioned by free-
form natural language descriptions with rich semantic cues. However, existing methods …

Languagerefer: Spatial-language model for 3d visual grounding

J Roh, K Desingh, A Farhadi… - Conference on Robot …, 2022 - proceedings.mlr.press
For robots to understand human instructions and perform meaningful tasks in the near
future, it is important to develop learned models that comprehend referential language to …

Shifting more attention to visual backbone: Query-modulated refinement networks for end-to-end visual grounding

J Ye, J Tian, M Yan, X Yang, X Wang… - Proceedings of the …, 2022 - openaccess.thecvf.com
Visual grounding focuses on establishing fine-grained alignment between vision and natural
language, which has essential applications in multimodal reasoning systems. Existing …

Multi-modal relational graph for cross-modal video moment retrieval

Y Zeng, D Cao, X Wei, M Liu… - Proceedings of the …, 2021 - openaccess.thecvf.com
Given an untrimmed video and a query sentence, cross-modal video moment retrieval aims
to rank a video moment from pre-segmented video moment candidates that best matches …

Look around and refer: 2d synthetic semantics knowledge distillation for 3d visual grounding

E Bakr, Y Alsaedy, M Elhoseiny - Advances in neural …, 2022 - proceedings.neurips.cc
Abstract 3D visual grounding task has been explored with visual and language streams to
comprehend referential language for identifying targeted objects in 3D scenes. However …

Free-form description guided 3d visual graph network for object grounding in point cloud

M Feng, Z Li, Q Li, L Zhang, XD Zhang… - Proceedings of the …, 2021 - openaccess.thecvf.com
Abstract 3D object grounding aims to locate the most relevant target object in a raw point
cloud scene based on a free-form language description. Understanding complex and …