Achieving cross modal generalization with multimodal unified representation

Y Xia, H Huang, J Zhu, Z Zhao - Advances in Neural …, 2024 - proceedings.neurips.cc
This paper introduces a novel task called Cross Modal Generalization (CMG), which
addresses the challenge of learning a unified discrete representation from paired …

[HTML][HTML] A Survey on Knowledge Distillation: Recent Advancements

A Moslemi, A Briskina, Z Dang, J Li - Machine Learning with Applications, 2024 - Elsevier
Deep learning has achieved notable success across academia, medicine, and industry. Its
ability to identify complex patterns in large-scale data and to manage millions of parameters …

Learning modality knowledge alignment for cross-modality transfer

W Ma, S Li, L Cai, J Kang - arXiv preprint arXiv:2406.18864, 2024 - arxiv.org
Cross-modality transfer aims to leverage large pretrained models to complete tasks that may
not belong to the modality of pretraining data. Existing works achieve certain success in …

Uncovering the hidden dynamics of video self-supervised learning under distribution shifts

P Sarkar, A Beirami, A Etemad - Advances in Neural …, 2024 - proceedings.neurips.cc
Video self-supervised learning (VSSL) has made significant progress in recent years.
However, the exact behavior and dynamics of these models under different forms of …

Breaking modality gap in RGBT tracking: Coupled knowledge distillation

A Lu, J Zhao, C Li, Y Xiao, B Luo - Proceedings of the 32nd ACM …, 2024 - dl.acm.org
Modality gap between RGB and thermal infrared (TIR) images is a crucial issue but often
overlooked in existing RGBT tracking methods. It can be observed that modality gap mainly …

A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation Learning

L Vilaca, Y Yu, P Vinan - arXiv preprint arXiv:2412.00049, 2024 - arxiv.org
Audio-visual correlation learning aims to capture and understand natural phenomena
between audio and visual data. The rapid growth of Deep Learning propelled the …

Multi Teacher Privileged Knowledge Distillation for Multimodal Expression Recognition

MH Aslam, M Pedersoli, AL Koerich… - arXiv preprint arXiv …, 2024 - arxiv.org
Human emotion is a complex phenomenon conveyed and perceived through facial
expressions, vocal tones, body language, and physiological signals. Multimodal emotion …

Discom-kd: Cross-modal knowledge distillation via disentanglement representation and adversarial learning

D Ienco, CF Dantas - arXiv preprint arXiv:2408.07080, 2024 - arxiv.org
Cross-modal knowledge distillation (CMKD) refers to the scenario in which a learning
framework must handle training and test data that exhibit a modality mismatch, more …

EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning

J Kim, H Lee, K Rho, J Kim, JS Chung - arXiv preprint arXiv:2403.09502, 2024 - arxiv.org
Recent advancements in self-supervised audio-visual representation learning have
demonstrated its potential to capture rich and comprehensive representations. However …

Virtual Visual-Guided Domain-Shadow Fusion via Modal Exchanging for Domain-Specific Multi-Modal Neural Machine Translation

Z Hou, J Guo - Proceedings of the 32nd ACM International …, 2024 - dl.acm.org
Incorporating domain-specific visual information into text poses one of the critical challenges
for domain-specific multi-modal neural machine translation (DMNMT). While most existing …