Embodied language grounding with 3d visual feature representations

Y Wu, X Cheng, R Zhang, Z Cheng… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract 3D visual grounding aims to find the object within point clouds mentioned by free-
form natural language descriptions with rich semantic cues. However, existing methods …

被引用次数：68 相关文章所有 5 个版本

[PDF] arxiv.org

Film: Following instructions in language with modular methods

SY Min, DS Chaplot, P Ravikumar, Y Bisk… - arXiv preprint arXiv …, 2021 - arxiv.org

Recent methods for embodied instruction following are typically trained end-to-end using
imitation learning. This often requires the use of expert trajectories and low-level language …

被引用次数：152 相关文章所有 3 个版本

[PDF] mlr.press

Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following

M Ding, Y Xu, Z Chen, DD Cox, P Luo… - … on robot learning, 2023 - proceedings.mlr.press

Humans, even at a very early age, can learn visual concepts and understand geometry and
layout through active interaction with the environment, and generalize their compositions to …

被引用次数：18 相关文章所有 5 个版本

[PDF] thecvf.com

Episodic memory question answering

S Datta, S Dharur, V Cartillier, R Desai… - Proceedings of the …, 2022 - openaccess.thecvf.com

Egocentric augmented reality devices such as wearable glasses passively capture visual
data as a human wearer tours a home environment. We envision a scenario wherein the …

被引用次数：32 相关文章所有 5 个版本

[PDF] arxiv.org

Learning 3d dynamic scene representations for robot manipulation

Z Xu, Z He, J Wu, S Song - arXiv preprint arXiv:2011.01968, 2020 - arxiv.org

3D scene representation for robot manipulation should capture three key object properties:
permanency--objects that become occluded over time continue to exist; amodal …

被引用次数：55 相关文章所有 5 个版本

Visual language navigation: A survey and open challenges

SM Park, YG Kim - Artificial Intelligence Review, 2023 - Springer

With the recent development of deep learning, AI models are widely used in various
domains. AI models show good performance for definite tasks such as image classification …

被引用次数：25 相关文章所有 5 个版本

[PDF] thecvf.com

Fast and explicit neural view synthesis

P Guo, MA Bautista, A Colburn… - Proceedings of the …, 2022 - openaccess.thecvf.com

We study the problem of novel view synthesis from sparse source observations of a scene
comprised of 3D objects. We propose a simple yet effective approach that is neither …

被引用次数：31 相关文章所有 5 个版本

Four Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding

O Unal, C Sakaridis, S Saha, L Van Gool - European Conference on …, 2025 - Springer

Abstract 3D visual grounding is the task of localizing the object in a 3D scene which is
referred by a description in natural language. With a wide range of applications ranging from …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Voxel-informed language grounding

R Corona, S Zhu, D Klein, T Darrell - arXiv preprint arXiv:2205.09710, 2022 - arxiv.org

Natural language applied to natural 2D images describes a fundamentally 3D world. We
present the Voxel-informed Language Grounder (VLG), a language grounding model that …

被引用次数：13 相关文章所有 6 个版本

[PDF] thecvf.com

Multi-Attribute Interactions Matter for 3D Visual Grounding

C Xu, Y Han, R Xu, L Hui, J Xie… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract 3D visual grounding aims to localize 3D objects described by free-form language
sentences. Following the detection-then-matching paradigm existing methods mainly focus …

高级搜索

QQ 群