In recent years, deep learning algorithms have rapidly revolutionized artificial intelligence, particularly machine learning, enabling researchers and practitioners to extend previously …
Vision-language representation learning largely benefits from image-text alignment through contrastive losses (eg, InfoNCE loss). The success of this alignment strategy is attributed to …
Z Jin, M Hayat, Y Yang, Y Guo… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Abstract 3D visual language reasoning plays an important role in effective human-computer interaction. The current approaches for 3D visual reasoning are task-specific, and lack pre …
In this paper, we study how to use masked signal modeling in vision and language (V+ L) representation learning. Instead of developing masked language modeling (MLM) and …
This paper introduces a novel task called Cross Modal Generalization (CMG), which addresses the challenge of learning a unified discrete representation from paired …
Y Xu, H Chen - Proceedings of the IEEE/CVF International …, 2023 - openaccess.thecvf.com
Survival prediction is a complicated ordinal regression task that aims to predict the ranking risk of death, which generally benefits from the integration of histology and genomic data …
X Hu, C Zhang, Y Zhang, B Hai… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Pre-trained Visual-Language Models (VLMs), such as CLIP, have shown enhanced performance across a range of tasks that involve the integration of visual and linguistic …
Contrastive loss has been increasingly used in learning representations from multiple modalities. In the limit, the nature of the contrastive loss encourages modalities to exactly …
Vision and Language Pretraining has become the prevalent approach for tackling multimodal downstream tasks. The current trend is to move towards ever larger models and …