Gibson env: Real-world perception for embodied agents

J Duan, S Yu, HL Tan, H Zhu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

There has been an emerging paradigm shift from the era of “internet AI” to “embodied AI,”
where AI algorithms and agents no longer learn from datasets of images, videos or text …

被引用次数：194 相关文章所有 8 个版本

[PDF] ieee.org

Crossing the reality gap: A survey on sim-to-real transferability of robot controllers in reinforcement learning

E Salvato, G Fenu, E Medvet, FA Pellegrino - IEEE Access, 2021 - ieeexplore.ieee.org

The growing demand for robots able to act autonomously in complex scenarios has widely
accelerated the introduction of Reinforcement Learning (RL) in robots control applications …

被引用次数：120 相关文章所有 4 个版本

[PDF] mlr.press

Lm-nav: Robotic navigation with large pre-trained models of language, vision, and action

D Shah, B Osiński, S Levine - Conference on robot …, 2023 - proceedings.mlr.press

Goal-conditioned policies for robotic navigation can be trained on large, unannotated
datasets, providing for good generalization to real-world settings. However, particularly in …

被引用次数：257 相关文章所有 5 个版本

[PDF] neurips.cc

Where are we in the search for an artificial visual cortex for embodied intelligence?

A Majumdar, K Yadav, S Arnaud, J Ma… - Advances in …, 2024 - proceedings.neurips.cc

We present the largest and most comprehensive empirical study of pre-trained visual
representations (PVRs) or visual 'foundation models' for Embodied AI. First, we curate …

被引用次数：85 相关文章所有 6 个版本

[PDF] neurips.cc

🏘️ ProcTHOR: Large-Scale Embodied AI Using Procedural Generation

M Deitke, E VanderBilt, A Herrasti… - Advances in …, 2022 - proceedings.neurips.cc

Massive datasets and high-capacity models have driven many recent advancements in
computer vision and natural language understanding. This work presents a platform to …

被引用次数：133 相关文章所有 5 个版本

[PDF] thecvf.com

Spatialvlm: Endowing vision-language models with spatial reasoning capabilities

B Chen, Z Xu, S Kirmani, B Ichter… - Proceedings of the …, 2024 - openaccess.thecvf.com

Understanding and reasoning about spatial relationships is crucial for Visual Question
Answering (VQA) and robotics. Vision Language Models (VLMs) have shown impressive …

被引用次数：30 相关文章所有 5 个版本

[PDF] thecvf.com

Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives

K Grauman, A Westbury, L Torresani… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract We present Ego-Exo4D a diverse large-scale multimodal multiview video dataset
and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric …

被引用次数：34 相关文章所有 5 个版本

[PDF] thecvf.com

Kubric: A scalable dataset generator

K Greff, F Belletti, L Beyer, C Doersch… - Proceedings of the …, 2022 - openaccess.thecvf.com

Data is the driving force of machine learning, with the amount and quality of training data
often being more important for the performance of a system than architecture and training …

被引用次数：147 相关文章所有 5 个版本

[PDF] arxiv.org

Navigating to objects in the real world

T Gervet, S Chintala, D Batra, J Malik, DS Chaplot - Science Robotics, 2023 - science.org

Semantic navigation is necessary to deploy mobile robots in uncontrolled environments
such as homes or hospitals. Many learning-based approaches have been proposed in …

被引用次数：68 相关文章所有 8 个版本

[PDF] thecvf.com

Simple but effective: Clip embeddings for embodied ai

A Khandelwal, L Weihs, R Mottaghi… - Proceedings of the …, 2022 - openaccess.thecvf.com

Contrastive language image pretraining (CLIP) encoders have been shown to be beneficial
for a range of visual tasks from classification and detection to captioning and image …

被引用次数：187 相关文章所有 5 个版本

高级搜索

QQ 群