Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

PP Liang, A Zadeh, LP Morency - arXiv preprint arXiv:2209.03430, 2022 - arxiv.org
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Foundations & trends in multimodal machine learning: Principles, challenges, and open questions

PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2024 - dl.acm.org
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning

T Yu, D Quillen, Z He, R Julian… - … on robot learning, 2020 - proceedings.mlr.press
Meta-reinforcement learning algorithms can enable robots to acquire new skills much more
quickly, by leveraging prior experience to learn how to learn. However, much of the current …

Habitat: A platform for embodied ai research

M Savva, A Kadian, O Maksymets… - Proceedings of the …, 2019 - openaccess.thecvf.com
We present Habitat, a platform for research in embodied artificial intelligence (AI). Habitat
enables training embodied agents (virtual robots) in highly efficient photorealistic 3D …

Sapien: A simulated part-based interactive environment

F Xiang, Y Qin, K Mo, Y Xia, H Zhu… - Proceedings of the …, 2020 - openaccess.thecvf.com
Building home assistant robots has long been a goal for vision and robotics researchers. To
achieve this task, a simulated environment with physically realistic simulation, sufficient …

On evaluation of embodied navigation agents

P Anderson, A Chang, DS Chaplot… - arXiv preprint arXiv …, 2018 - arxiv.org
Skillful mobile operation in three-dimensional environments is a primary topic of study in
Artificial Intelligence. The past two years have seen a surge of creative work on navigation …

Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments

P Anderson, Q Wu, D Teney, J Bruce… - Proceedings of the …, 2018 - openaccess.thecvf.com
A robot that can carry out a natural-language instruction has been a dream since before the
Jetsons cartoon series imagined a life of leisure mediated by a fleet of attentive robot …

Building cooperative embodied agents modularly with large language models

H Zhang, W Du, J Shan, Q Zhou, Y Du… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) have demonstrated impressive planning abilities in single-
agent embodied tasks across various domains. However, their capacity for planning and …

Soundspaces: Audio-visual navigation in 3d environments

C Chen, U Jain, C Schissler, SVA Gari… - Computer Vision–ECCV …, 2020 - Springer
Moving around in the world is naturally a multisensory experience, but today's embodied
agents are deaf—restricted to solely their visual perception of the environment. We introduce …

Robothor: An open simulation-to-real embodied ai platform

M Deitke, W Han, A Herrasti… - Proceedings of the …, 2020 - openaccess.thecvf.com
Visual recognition ecosystems (eg ImageNet, Pascal, COCO) have undeniably played a
prevailing role in the evolution of modern computer vision. We argue that interactive and …