Multimodal Vision Language Models (VLMs) have emerged as a transformative technology at the intersection of computer vision and natural language processing, enabling machines …
We propose XVO, a semi-supervised learning method for training generalized monocular Visual Odometry (VO) models with robust off-the-self operation across diverse datasets and …
J Zhang, R Zhu, E Ohn-Bar - Proceedings of the IEEE/CVF …, 2022 - openaccess.thecvf.com
Effectively utilizing the vast amounts of ego-centric navigation data that is freely available on the internet can advance generalized intelligent systems, ie, to robustly scale across …
We introduce a novel vision-and-language navigation (VLN) task of learning to provide real- time guidance to a blind follower situated in complex dynamic navigation scenarios …
HJ Kim, E Ohn-Bar - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Abstract We introduce Motion Diversification Networks a novel framework for learning to generate realistic and diverse 3D human motion. Despite recent advances in deep …
J Zhang, Z Huang, A Ray… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
While behavior cloning has recently emerged as a highly successful paradigm for autonomous driving humans rarely learn to perform complex tasks such as driving via …
Embodied vision-based real-world systems, such as mobile robots, require a careful balance between energy consumption, compute latency, and safety constraints to optimize …
People who are blind perceive the world differently than those who are sighted, which can result in distinct motion characteristics. For instance, when crossing at an intersection, blind …
Models for student reading performance can empower educators and institutions to proactively identify at-risk students, thereby enabling early and tailored instructional …