Unifying 3D Vision-Language Understanding via Promptable Queries

Z Zhu, Z Zhang, X Ma, X Niu, Y Chen, B Jia… - arXiv preprint arXiv …, 2024 - arxiv.org
A unified model for 3D vision-language (3D-VL) understanding is expected to take various
scene representations and perform a wide range of tasks in a 3D scene. However, a …

UniQRNet: Unifying Referring Expression Grounding and Segmentation with QRNet

J Ye, J Tian, M Yan, H Xu, Q Ye, Y Shi, X Yang… - ACM Transactions on … - dl.acm.org
Referring expression comprehension aims to align natural language queries with visual
scenes, which requires establishing fine-grained correspondence between vision and …

Pointllm: Empowering large language models to understand point clouds

R Xu, X Wang, T Wang, Y Chen, J Pang… - arXiv preprint arXiv …, 2023 - arxiv.org
The unprecedented advancements in Large Language Models (LLMs) have created a
profound impact on natural language processing but are yet to fully embrace the realm of 3D …

What goes beyond multi-modal fusion in one-stage referring expression comprehension: An empirical study

G Luo, Y Zhou, J Sun, S Huang, X Sun, Q Ye… - arXiv preprint arXiv …, 2022 - arxiv.org
Most of the existing work in one-stage referring expression comprehension (REC) mainly
focuses on multi-modal fusion and reasoning, while the influence of other factors in this task …

Rethinking Two-Stage Referring Expression Comprehension: A Novel Grounding and Segmentation Method Modulated by Point

P Zhao, S Zheng, W Zhao, D Xu, P Li, Y Cai… - Proceedings of the …, 2024 - ojs.aaai.org
As a fundamental and challenging task in the vision and language domain, Referring
Expression Comprehension (REC) has shown impressive improvements recently. However …

Refego: Referring expression comprehension dataset from first-person perception of ego4d

S Kurita, N Katsura, E Onami - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Grounding textual expressions on scene objects from first-person views is a truly demanding
capability in developing agents that are aware of their surroundings and behave following …

Radar Spectra-Language Model for Automotive Scene Parsing

M Pushkareva, Y Feldman, C Domokos… - arXiv preprint arXiv …, 2024 - arxiv.org
Radar sensors are low cost, long-range, and weather-resilient. Therefore, they are widely
used for driver assistance functions, and are expected to be crucial for the success of …

Text-guided graph neural networks for referring 3d instance segmentation

PH Huang, HH Lee, HT Chen, TL Liu - Proceedings of the AAAI …, 2021 - ojs.aaai.org
This paper addresses a new task called referring 3D instance segmentation, which aims to
segment out the target instance in a 3D scene given a query sentence. Previous work on …

Groundnlq@ ego4d natural language queries challenge 2023

Z Hou, L Ji, D Gao, W Zhong, K Yan, C Li… - arXiv preprint arXiv …, 2023 - arxiv.org
In this report, we present our champion solution for Ego4D Natural Language Queries (NLQ)
Challenge in CVPR 2023. Essentially, to accurately ground in a video, an effective …

UnstrPrompt: Large language model prompt for driving in unstructured scenarios

Y Li, L Li, Z Wu, Z Bing, Z Xuanyuan… - IEEE Journal of …, 2024 - ieeexplore.ieee.org
The integration of language descriptions or prompts with Large Language Models (LLMs)
into visual tasks is currently a focal point in the advancement of autonomous driving. This …