A survey on active simultaneous localization and mapping: State of the art and new frontiers

JA Placed, J Strader, H Carrillo… - IEEE Transactions …, 2023 - ieeexplore.ieee.org
Active simultaneous localization and mapping (SLAM) is the problem of planning and
controlling the motion of a robot to build the most accurate and complete model of the …

Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

PP Liang, A Zadeh, LP Morency - arXiv preprint arXiv:2209.03430, 2022 - arxiv.org
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Voyager: An open-ended embodied agent with large language models

G Wang, Y Xie, Y Jiang, A Mandlekar, C Xiao… - arXiv preprint arXiv …, 2023 - arxiv.org
We introduce Voyager, the first LLM-powered embodied lifelong learning agent in Minecraft
that continuously explores the world, acquires diverse skills, and makes novel discoveries …

3d-llm: Injecting the 3d world into large language models

Y Hong, H Zhen, P Chen, S Zheng… - Advances in …, 2023 - proceedings.neurips.cc
Large language models (LLMs) and Vision-Language Models (VLMs) have been proved to
excel at multiple tasks, such as commonsense reasoning. Powerful as these models can be …

The rise and potential of large language model based agents: A survey

Z Xi, W Chen, X Guo, W He, Y Ding, B Hong… - arXiv preprint arXiv …, 2023 - arxiv.org
For a long time, humanity has pursued artificial intelligence (AI) equivalent to or surpassing
the human level, with AI agents considered a promising vehicle for this pursuit. AI agents are …

Chatgpt for robotics: Design principles and model abilities

SH Vemprala, R Bonatti, A Bucker, A Kapoor - IEEE Access, 2024 - ieeexplore.ieee.org
This paper presents an experimental study regarding the use of OpenAI's ChatGPT for
robotics applications. We outline a strategy that combines design principles for prompt …

Minedojo: Building open-ended embodied agents with internet-scale knowledge

L Fan, G Wang, Y Jiang, A Mandlekar… - Advances in …, 2022 - proceedings.neurips.cc
Autonomous agents have made great strides in specialist domains like Atari games and Go.
However, they typically learn tabula rasa in isolated environments with limited and manually …

Lm-nav: Robotic navigation with large pre-trained models of language, vision, and action

D Shah, B Osiński, S Levine - Conference on robot …, 2023 - proceedings.mlr.press
Goal-conditioned policies for robotic navigation can be trained on large, unannotated
datasets, providing for good generalization to real-world settings. However, particularly in …

Generative novel view synthesis with 3d-aware diffusion models

ER Chan, K Nagano, MA Chan… - Proceedings of the …, 2023 - openaccess.thecvf.com
We present a diffusion-based model for 3D-aware generative novel view synthesis from as
few as a single input image. Our model samples from the distribution of possible renderings …

Unisim: A neural closed-loop sensor simulator

Z Yang, Y Chen, J Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Rigorously testing autonomy systems is essential for making safe self-driving vehicles (SDV)
a reality. It requires one to generate safety critical scenarios beyond what can be collected …