A theory of multimodal learning

Z Lu - Advances in Neural Information Processing Systems, 2024 - proceedings.neurips.cc
Human perception of the empirical world involves recognizing the diverse appearances,
or'modalities', of underlying objects. Despite the longstanding consideration of this …

Multimodal vigilance estimation with modality-pairwise contrastive loss

M Zhang, Z Luo, L Xie, T Liu, Y Yan… - IEEE Transactions …, 2023 - ieeexplore.ieee.org
Nowadays, how to estimate vigilance with higher accuracy has become a hot field of
research direction. Although the increasing available modalities opens the door for amazing …

On the Computational Benefit of Multimodal Learning

Z Lu - International Conference on Algorithmic Learning …, 2024 - proceedings.mlr.press
Human perception inherently operates in a multimodal manner. Similarly, as machines
interpret the empirical world, their learning processes ought to be multimodal. The recent …

Semi-Supervised Federated Analytics for Heterogeneous Household Characteristics Identification

W Chen, S Bu, X Zhang, Y Tao… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
The widespread use of smart meters in households paves the way for retailers to understand
household patterns through electricity usage data. This insight helps them offer personalized …

Understanding the robustness of multi-modal contrastive learning to distribution shift

Y Xue, S Joshi, D Nguyen, B Mirzasoleiman - arXiv preprint arXiv …, 2023 - arxiv.org
Recently, multimodal contrastive learning (MMCL) approaches, such as CLIP, have
achieved a remarkable success in learning representations that are robust against …

A Multimodal Transformer Model for Recognition of Images from Complex Laparoscopic Surgical Videos

RH Abiyev, MZ Altabel, M Darwish, A Helwan - Diagnostics, 2024 - mdpi.com
The determination of the potential role and advantages of artificial intelligence-based
models in the field of surgery remains uncertain. This research marks an initial stride …

Messing With The Gap: On The Modality Gap Phenomenon In Multimodal Contrastive Representation Learning

M Al-Jaff - 2023 - diva-portal.org
In machine learning, a sub-field of computer science, a two-tower architecture model is a
specialised type of neural network model that encodes paired data from different modalities …

[PDF][PDF] Advancing Robust and Aligned Measures of Semantic Similarity in Large Language Models

S Goel - 2024 - eecs.berkeley.edu
With the rapid advancement of large language models (LLMs) like OpenAI's ChatGPT,
Anthropic's Claude, and Google's Gemini, the technology's potential impact on society is …