Neural Scaling Laws for Embodied AI

S Sartor, N Thompson - arXiv preprint arXiv:2405.14005, 2024 - arxiv.org
Scaling laws have driven remarkable progress across machine learning domains like
language modeling and computer vision. However, the exploration of scaling laws in …

Recommendations for designing conversational companion robots with older adults through foundation models

B Irfan, S Kuoppamäki, G Skantze - Frontiers in Robotics and AI, 2024 - frontiersin.org
Companion robots are aimed to mitigate loneliness and social isolation among older adults
by providing social and emotional support in their everyday lives. However, older adults' …

Logic Learning from Demonstrations for Multi-step Manipulation Tasks in Dynamic Environments

Y Zhang, T Xue, A Razmjoo… - IEEE Robotics and …, 2024 - ieeexplore.ieee.org
Learning from Demonstration (LfD) stands as an efficient framework for imparting human-
like skills to robots. Nevertheless, designing an LfD framework capable of seamlessly …

Beyond Text: Improving LLM's Decision Making for Robot Navigation via Vocal Cues

X Sun, H Meng, S Chakraborty, AS Bedi… - arXiv preprint arXiv …, 2024 - arxiv.org
This work highlights a critical shortcoming in text-based Large Language Models (LLMs)
used for human-robot interaction, demonstrating that text alone as a conversation modality …

PWM: Policy Learning with Large World Models

I Georgiev, V Giridhar, N Hansen, A Garg - arXiv preprint arXiv …, 2024 - arxiv.org
Reinforcement Learning (RL) has achieved impressive results on complex tasks but
struggles in multi-task settings with different embodiments. World models offer scalability by …

GPT-Fabric: Folding and Smoothing Fabric by Leveraging Pre-Trained Foundation Models

V Raval, E Zhao, H Zhang, S Nikolaidis… - arXiv preprint arXiv …, 2024 - arxiv.org
Fabric manipulation has applications in folding blankets, handling patient clothing, and
protecting items with covers. It is challenging for robots to perform fabric manipulation since …

LiveScene: Language Embedding Interactive Radiance Fields for Physical Scene Rendering and Control

D Qu, Q Chen, P Zhang, X Gao, B Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org
This paper aims to advance the progress of physical world interactive scene reconstruction
by extending the interactive object reconstruction from single object level to complex scene …

DegustaBot: Zero-Shot Visual Preference Estimation for Personalized Multi-Object Rearrangement

BA Newman, P Gupta, K Kitani, Y Bisk… - arXiv preprint arXiv …, 2024 - arxiv.org
De gustibus non est disputandum (" there is no accounting for others' tastes") is a common
Latin maxim describing how many solutions in life are determined by people's personal …

CoNVOI: Context-aware Navigation using Vision Language Models in Outdoor and Indoor Environments

AJ Sathyamoorthy, K Weerakoon, M Elnoor… - arXiv preprint arXiv …, 2024 - arxiv.org
We present ConVOI, a novel method for autonomous robot navigation in real-world indoor
and outdoor environments using Vision Language Models (VLMs). We employ VLMs in two …

MetaFruit Meets Foundation Models: Leveraging a Comprehensive Multi-Fruit Dataset for Advancing Agricultural Foundation Models

J Li, K Lammers, X Yin, X Yin, L He, R Lu… - arXiv preprint arXiv …, 2024 - arxiv.org
Fruit harvesting poses a significant labor and financial burden for the industry, highlighting
the critical need for advancements in robotic harvesting solutions. Machine vision-based fruit …