Multimodal Artificial Intelligence (Multimodal AI), in general, involves various types of data (eg, images, texts, or data collected from different sensors), feature engineering (eg …
This paper introduces a novel task called Cross Modal Generalization (CMG), which addresses the challenge of learning a unified discrete representation from paired …
Abstract Vision-Language models (VLMs) have excelled in the image-domain---especially in zero-shot settings---thanks to the availability of vast pretraining data (ie paired image-text …
H Yan, Q Cui, J Xie, S Guo - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com
In the context of computer vision and human-robot interaction forecasting 3D human poses is crucial for understanding human behavior and enhancing the predictive capabilities of …
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences. In this paper, we challenge this modality …
S Srivastava, G Sharma - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
We present a novel multimodal multitask network and associated training algorithm. The method is capable of ingesting data from approximately 12 different modalities namely …
Augmented Reality (AR) devices, emerging as prominent mobile interaction platforms, face challenges in user safety, particularly concerning oncoming vehicles. While some solutions …
J Shi, L Yu, Q Cheng, X Yang… - IEEE Journal of …, 2023 - ieeexplore.ieee.org
Brain tumor segmentation is a fundamental task and existing approaches usually rely on multi-modality magnetic resonance imaging (MRI) images for accurate segmentation …
Training deep learning models for video classification from audio-visual data commonly requires vast amounts of labeled training data collected via a costly process. A challenging …