Dd-ppo: Learning near-perfect pointgoal navigators from 2.5 billion frames

J Duan, S Yu, HL Tan, H Zhu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

There has been an emerging paradigm shift from the era of “internet AI” to “embodied AI,”
where AI algorithms and agents no longer learn from datasets of images, videos or text …

被引用次数：207 相关文章所有 8 个版本

[PDF] thecvf.com

Objaverse: A universe of annotated 3d objects

M Deitke, D Schwenk, J Salvador… - Proceedings of the …, 2023 - openaccess.thecvf.com

Massive data corpora like WebText, Wikipedia, Conceptual Captions, WebImageText, and
LAION have propelled recent dramatic progress in AI. Large neural models trained on such …

被引用次数：443 相关文章所有 5 个版本

[PDF] neurips.cc

3d-llm: Injecting the 3d world into large language models

Y Hong, H Zhen, P Chen, S Zheng… - Advances in …, 2023 - proceedings.neurips.cc

Large language models (LLMs) and Vision-Language Models (VLMs) have been proved to
excel at multiple tasks, such as commonsense reasoning. Powerful as these models can be …

被引用次数：127 相关文章所有 7 个版本

[PDF] mdpi.com

Challenges and solutions for autonomous ground robot scene understanding and navigation in unstructured outdoor environments: A review

L Wijayathunga, A Rassau, D Chai - Applied Sciences, 2023 - mdpi.com

The capabilities of autonomous mobile robotic systems have been steadily improving due to
recent advancements in computer science, engineering, and related disciplines such as …

被引用次数：16 相关文章所有 7 个版本

[PDF] mlr.press

Lm-nav: Robotic navigation with large pre-trained models of language, vision, and action

D Shah, B Osiński, S Levine - Conference on robot …, 2023 - proceedings.mlr.press

Goal-conditioned policies for robotic navigation can be trained on large, unannotated
datasets, providing for good generalization to real-world settings. However, particularly in …

被引用次数：281 相关文章所有 5 个版本

[PDF] thecvf.com

Diffusion-based generation, optimization, and planning in 3d scenes

S Huang, Z Wang, P Li, B Jia, T Liu… - Proceedings of the …, 2023 - openaccess.thecvf.com

We introduce SceneDiffuser, a conditional generative model for 3D scene understanding.
SceneDiffuser provides a unified model for solving scene-conditioned generation …

被引用次数：129 相关文章所有 9 个版本

[PDF] neurips.cc

Where are we in the search for an artificial visual cortex for embodied intelligence?

A Majumdar, K Yadav, S Arnaud, J Ma… - Advances in …, 2024 - proceedings.neurips.cc

We present the largest and most comprehensive empirical study of pre-trained visual
representations (PVRs) or visual 'foundation models' for Embodied AI. First, we curate …

被引用次数：89 相关文章所有 6 个版本

[PDF] neurips.cc

Habitat 2.0: Training home assistants to rearrange their habitat

A Szot, A Clegg, E Undersander… - Advances in neural …, 2021 - proceedings.neurips.cc

Abstract We introduce Habitat 2.0 (H2. 0), a simulation platform for training virtual robots in
interactive 3D environments and complex physics-enabled scenarios. We make …

被引用次数：429 相关文章所有 6 个版本

[PDF] arxiv.org

Navigating to objects in the real world

T Gervet, S Chintala, D Batra, J Malik, DS Chaplot - Science Robotics, 2023 - science.org

Semantic navigation is necessary to deploy mobile robots in uncontrolled environments
such as homes or hospitals. Many learning-based approaches have been proposed in …

被引用次数：73 相关文章所有 8 个版本

[PDF] mlr.press

The unsurprising effectiveness of pre-trained vision models for control

S Parisi, A Rajeswaran… - … on machine learning, 2022 - proceedings.mlr.press

Recent years have seen the emergence of pre-trained representations as a powerful
abstraction for AI applications in computer vision, natural language, and speech. However …

被引用次数：158 相关文章所有 5 个版本

高级搜索

QQ 群