We present Unified-IO 2 a multimodal and multi-skill unified model capable of following novel instructions. Unified-IO 2 can use text images audio and/or videos as input and can …
Z Fu, TZ Zhao, C Finn - arXiv preprint arXiv:2401.02117, 2024 - arxiv.org
Imitation learning from human demonstrations has shown impressive performance in robotics. However, most results focus on table-top manipulation, lacking the mobility and …
Recent developments in foundation models, like Large Language Models (LLMs) and Vision- Language Models (VLMs), trained on extensive data, facilitate flexible application across …
Large policies pretrained on diverse robot datasets have the potential to transform robotic learning: instead of training new policies from scratch, such generalist robot policies may be …
Foundation models have rapidly permeated society, catalyzing a wave of generative AI applications spanning enterprise and consumer-facing contexts. While the societal impact of …
If generalist robots are to operate in truly unstructured environments, they need to be able to recognize and reason about novel objects and scenarios. Such objects and scenarios might …
We survey applications of pretrained foundation models in robotics. Traditional deep learning models in robotics are trained on small datasets tailored for specific tasks, which …
X Li, M Liu, H Zhang, C Yu, J Xu, H Wu… - arXiv preprint arXiv …, 2023 - arxiv.org
Recent progress in vision language foundation models has shown their ability to understand multimodal data and resolve complicated vision language tasks, including robotics …
NMM Shafiullah, A Rai, H Etukuru, Y Liu, I Misra… - arXiv preprint arXiv …, 2023 - arxiv.org
Throughout history, we have successfully integrated various machines into our homes. Dishwashers, laundry machines, stand mixers, and robot vacuums are a few recent …