Synthetic Vision: Training Vision-Language Models to Understand Physics

V Balazadeh, M Ataei, H Cheong… - arXiv preprint arXiv …, 2024 - arxiv.org
Physical reasoning, which involves the interpretation, understanding, and prediction of
object behavior in dynamic environments, remains a significant challenge for current Vision …

BehAV: Behavioral Rule Guided Autonomy Using VLMs for Robot Navigation in Outdoor Scenes

K Weerakoon, M Elnoor, G Seneviratne… - arXiv preprint arXiv …, 2024 - arxiv.org
We present BehAV, a novel approach for autonomous robot navigation in outdoor scenes
guided by human instructions and leveraging Vision Language Models (VLMs). Our method …