相关文章- 学术资源搜索

Unifying 3D Vision-Language Understanding via Promptable Queries

Z Zhu, Z Zhang, X Ma, X Niu, Y Chen, B Jia… - arXiv preprint arXiv …, 2024 - arxiv.org

A unified model for 3D vision-language (3D-VL) understanding is expected to take various
scene representations and perform a wide range of tasks in a 3D scene. However, a …

UniQRNet: Unifying Referring Expression Grounding and Segmentation with QRNet

J Ye, J Tian, M Yan, H Xu, Q Ye, Y Shi, X Yang… - ACM Transactions on … - dl.acm.org

Referring expression comprehension aims to align natural language queries with visual
scenes, which requires establishing fine-grained correspondence between vision and …

[PDF] arxiv.org

Pointllm: Empowering large language models to understand point clouds

R Xu, X Wang, T Wang, Y Chen, J Pang… - arXiv preprint arXiv …, 2023 - arxiv.org

The unprecedented advancements in Large Language Models (LLMs) have created a
profound impact on natural language processing but are yet to fully embrace the realm of 3D …

被引用次数：47 相关文章所有 3 个版本

[PDF] arxiv.org

What goes beyond multi-modal fusion in one-stage referring expression comprehension: An empirical study

G Luo, Y Zhou, J Sun, S Huang, X Sun, Q Ye… - arXiv preprint arXiv …, 2022 - arxiv.org

Most of the existing work in one-stage referring expression comprehension (REC) mainly
focuses on multi-modal fusion and reasoning, while the influence of other factors in this task …

被引用次数：9 相关文章所有 2 个版本

[PDF] aaai.org

Rethinking Two-Stage Referring Expression Comprehension: A Novel Grounding and Segmentation Method Modulated by Point

P Zhao, S Zheng, W Zhao, D Xu, P Li, Y Cai… - Proceedings of the …, 2024 - ojs.aaai.org

As a fundamental and challenging task in the vision and language domain, Referring
Expression Comprehension (REC) has shown impressive improvements recently. However …

被引用次数：1 相关文章

[PDF] thecvf.com

Refego: Referring expression comprehension dataset from first-person perception of ego4d

S Kurita, N Katsura, E Onami - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

Grounding textual expressions on scene objects from first-person views is a truly demanding
capability in developing agents that are aware of their surroundings and behave following …

被引用次数：8 相关文章所有 6 个版本

[PDF] arxiv.org

Radar Spectra-Language Model for Automotive Scene Parsing

M Pushkareva, Y Feldman, C Domokos… - arXiv preprint arXiv …, 2024 - arxiv.org

Radar sensors are low cost, long-range, and weather-resilient. Therefore, they are widely
used for driver assistance functions, and are expected to be crucial for the success of …

[PDF] aaai.org

Text-guided graph neural networks for referring 3d instance segmentation

PH Huang, HH Lee, HT Chen, TL Liu - Proceedings of the AAAI …, 2021 - ojs.aaai.org

This paper addresses a new task called referring 3D instance segmentation, which aims to
segment out the target instance in a 3D scene given a query sentence. Previous work on …

被引用次数：95 相关文章所有 6 个版本

[PDF] arxiv.org

Groundnlq@ ego4d natural language queries challenge 2023

Z Hou, L Ji, D Gao, W Zhong, K Yan, C Li… - arXiv preprint arXiv …, 2023 - arxiv.org

In this report, we present our champion solution for Ego4D Natural Language Queries (NLQ)
Challenge in CVPR 2023. Essentially, to accurately ground in a video, an effective …

被引用次数：6 相关文章所有 3 个版本

UnstrPrompt: Large language model prompt for driving in unstructured scenarios

Y Li, L Li, Z Wu, Z Bing, Z Xuanyuan… - IEEE Journal of …, 2024 - ieeexplore.ieee.org

The integration of language descriptions or prompts with Large Language Models (LLMs)
into visual tasks is currently a focal point in the advancement of autonomous driving. This …

被引用次数：2 相关文章所有 3 个版本

高级搜索

QQ 群

Unifying 3D Vision-Language Understanding via Promptable Queries

UniQRNet: Unifying Referring Expression Grounding and Segmentation with QRNet

Pointllm: Empowering large language models to understand point clouds

What goes beyond multi-modal fusion in one-stage referring expression comprehension: An empirical study

Rethinking Two-Stage Referring Expression Comprehension: A Novel Grounding and Segmentation Method Modulated by Point

Refego: Referring expression comprehension dataset from first-person perception of ego4d

Radar Spectra-Language Model for Automotive Scene Parsing

Text-guided graph neural networks for referring 3d instance segmentation

Groundnlq@ ego4d natural language queries challenge 2023

UnstrPrompt: Large language model prompt for driving in unstructured scenarios

引用