Omnivec: Learning robust representations with cross modal sharing

S Srivastava, G Sharma - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com
Majority of research in learning based methods has been towards designing and training
networks for specific tasks. However, many of the learning based tasks, across modalities …

Deep Multimodal Data Fusion

F Zhao, C Zhang, B Geng - ACM Computing Surveys, 2024 - dl.acm.org
Multimodal Artificial Intelligence (Multimodal AI), in general, involves various types of data
(eg, images, texts, or data collected from different sensors), feature engineering (eg …

Achieving cross modal generalization with multimodal unified representation

Y Xia, H Huang, J Zhu, Z Zhao - Advances in Neural …, 2024 - proceedings.neurips.cc
This paper introduces a novel task called Cross Modal Generalization (CMG), which
addresses the challenge of learning a unified discrete representation from paired …

Victr: Video-conditioned text representations for activity recognition

K Kahatapitiya, A Arnab, A Nagrani… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Vision-Language models (VLMs) have excelled in the image-domain---especially in
zero-shot settings---thanks to the availability of vast pretraining data (ie paired image-text …

Forecasting of 3D Whole-body Human Poses with Grasping Objects

H Yan, Q Cui, J Xie, S Guo - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com
In the context of computer vision and human-robot interaction forecasting 3D human poses
is crucial for understanding human behavior and enhancing the predictive capabilities of …

Learning unseen modality interaction

Y Zhang, H Doughty, C Snoek - Advances in Neural …, 2024 - proceedings.neurips.cc
Multimodal learning assumes all modality combinations of interest are available during
training to learn cross-modal correspondences. In this paper, we challenge this modality …

OmniVec2-A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning

S Srivastava, G Sharma - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
We present a novel multimodal multitask network and associated training algorithm. The
method is capable of ingesting data from approximately 12 different modalities namely …

Segment Beyond View: Handling Partially Missing Modality for Audio-Visual Semantic Segmentation

R Wu, H Wang, F Dayoub, HT Chen - Proceedings of the AAAI …, 2024 - ojs.aaai.org
Augmented Reality (AR) devices, emerging as prominent mobile interaction platforms, face
challenges in user safety, particularly concerning oncoming vehicles. While some solutions …

M FTrans: Modality-Masked Fusion Transformer for Incomplete Multi-Modality Brain Tumor Segmentation

J Shi, L Yu, Q Cheng, X Yang… - IEEE Journal of …, 2023 - ieeexplore.ieee.org
Brain tumor segmentation is a fundamental task and existing approaches usually rely on
multi-modality magnetic resonance imaging (MRI) images for accurate segmentation …

Text-to-feature diffusion for audio-visual few-shot learning

OB Mercea, T Hummel, AS Koepke, Z Akata - DAGM German Conference …, 2023 - Springer
Training deep learning models for video classification from audio-visual data commonly
requires vast amounts of labeled training data collected via a costly process. A challenging …