Pubmedclip: How much does clip benefit visual question answering in the medical domain?

S Eslami, C Meinel, G De Melo - Findings of the Association for …, 2023 - aclanthology.org
Abstract Contrastive Language–Image Pre-training (CLIP) has shown remarkable success
in learning with cross-modal supervision from extensive amounts of image–text pairs …

Does clip benefit visual question answering in the medical domain as much as it does in the general domain?

S Eslami, G de Melo, C Meinel - arXiv preprint arXiv:2112.13906, 2021 - arxiv.org
Contrastive Language--Image Pre-training (CLIP) has shown remarkable success in
learning with cross-modal supervision from extensive amounts of image--text pairs collected …

Clippo: Image-and-language understanding from pixels only

M Tschannen, B Mustafa… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Multimodal models are becoming increasingly effective, in part due to unified components,
such as the Transformer architecture. However, multimodal models still often consist of many …

Parameter-efficient transfer learning for medical visual question answering

J Liu, T Hu, Y Zhang, Y Feng, J Hao… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
The Contrastive Language-Image Pre-Training (CLIP) model, pretrained on large visual text
corpora, has demonstrated significant improvements in visual and linguistic tasks and has …

Self-supervised vision-language pretraining for medial visual question answering

P Li, G Liu, L Tan, J Liao… - 2023 IEEE 20th …, 2023 - ieeexplore.ieee.org
Medical image visual question answering (VQA) is a task to answer clinical questions, given
a radiographic image, which is a challenging problem that requires a model to integrate both …

Masked vision and language pre-training with unimodal and multimodal contrastive losses for medical visual question answering

P Li, G Liu, J He, Z Zhao, S Zhong - International Conference on Medical …, 2023 - Springer
Medical visual question answering (VQA) is a challenging task that requires answering
clinical questions of a given medical image, by taking consider of both visual and language …

Reveal: Retrieval-augmented visual-language pre-training with multi-source multimodal knowledge memory

Z Hu, A Iscen, C Sun, Z Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
In this paper, we propose an end-to-end Retrieval-Augmented Visual Language Model
(REVEAL) that learns to encode world knowledge into a large-scale memory, and to retrieve …

Cross-modal self-attention with multi-task pre-training for medical visual question answering

H Gong, G Chen, S Liu, Y Yu, G Li - Proceedings of the 2021 …, 2021 - dl.acm.org
Due to the severe lack of labeled data, existing methods of medical visual question
answering usually rely on transfer learning to obtain effective image feature representation …

Answer-me: Multi-task open-vocabulary visual question answering

AJ Piergiovanni, W Li, W Kuo, M Saffar… - arXiv preprint arXiv …, 2022 - arxiv.org
We present Answer-Me, a task-aware multi-task framework which unifies a variety of
question answering tasks, such as, visual question answering, visual entailment, visual …

AMAM: an attention-based multimodal alignment model for medical visual question answering

H Pan, S He, K Zhang, B Qu, C Chen, K Shi - Knowledge-Based Systems, 2022 - Elsevier
Abstract Medical Visual Question Answering (VQA) is a multimodal task to answer clinical
questions about medical images. Existing methods have achieved good performance, but …