A survey of embodied ai: From simulators to research tasks

J Duan, S Yu, HL Tan, H Zhu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
There has been an emerging paradigm shift from the era of “internet AI” to “embodied AI,”
where AI algorithms and agents no longer learn from datasets of images, videos or text …

Crossing the reality gap: A survey on sim-to-real transferability of robot controllers in reinforcement learning

E Salvato, G Fenu, E Medvet, FA Pellegrino - IEEE Access, 2021 - ieeexplore.ieee.org
The growing demand for robots able to act autonomously in complex scenarios has widely
accelerated the introduction of Reinforcement Learning (RL) in robots control applications …

Lm-nav: Robotic navigation with large pre-trained models of language, vision, and action

D Shah, B Osiński, S Levine - Conference on robot …, 2023 - proceedings.mlr.press
Goal-conditioned policies for robotic navigation can be trained on large, unannotated
datasets, providing for good generalization to real-world settings. However, particularly in …

Where are we in the search for an artificial visual cortex for embodied intelligence?

A Majumdar, K Yadav, S Arnaud, J Ma… - Advances in …, 2024 - proceedings.neurips.cc
We present the largest and most comprehensive empirical study of pre-trained visual
representations (PVRs) or visual 'foundation models' for Embodied AI. First, we curate …

🏘️ ProcTHOR: Large-Scale Embodied AI Using Procedural Generation

M Deitke, E VanderBilt, A Herrasti… - Advances in …, 2022 - proceedings.neurips.cc
Massive datasets and high-capacity models have driven many recent advancements in
computer vision and natural language understanding. This work presents a platform to …

Spatialvlm: Endowing vision-language models with spatial reasoning capabilities

B Chen, Z Xu, S Kirmani, B Ichter… - Proceedings of the …, 2024 - openaccess.thecvf.com
Understanding and reasoning about spatial relationships is crucial for Visual Question
Answering (VQA) and robotics. Vision Language Models (VLMs) have shown impressive …

Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives

K Grauman, A Westbury, L Torresani… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract We present Ego-Exo4D a diverse large-scale multimodal multiview video dataset
and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric …

Kubric: A scalable dataset generator

K Greff, F Belletti, L Beyer, C Doersch… - Proceedings of the …, 2022 - openaccess.thecvf.com
Data is the driving force of machine learning, with the amount and quality of training data
often being more important for the performance of a system than architecture and training …

Navigating to objects in the real world

T Gervet, S Chintala, D Batra, J Malik, DS Chaplot - Science Robotics, 2023 - science.org
Semantic navigation is necessary to deploy mobile robots in uncontrolled environments
such as homes or hospitals. Many learning-based approaches have been proposed in …

Simple but effective: Clip embeddings for embodied ai

A Khandelwal, L Weihs, R Mottaghi… - Proceedings of the …, 2022 - openaccess.thecvf.com
Contrastive language image pretraining (CLIP) encoders have been shown to be beneficial
for a range of visual tasks from classification and detection to captioning and image …