相关文章- 学术资源搜索

Pubmedclip: How much does clip benefit visual question answering in the medical domain?

S Eslami, C Meinel, G De Melo - Findings of the Association for …, 2023 - aclanthology.org

Abstract Contrastive Language–Image Pre-training (CLIP) has shown remarkable success
in learning with cross-modal supervision from extensive amounts of image–text pairs …

被引用次数：40 相关文章

[PDF] arxiv.org

Does clip benefit visual question answering in the medical domain as much as it does in the general domain?

S Eslami, G de Melo, C Meinel - arXiv preprint arXiv:2112.13906, 2021 - arxiv.org

Contrastive Language--Image Pre-training (CLIP) has shown remarkable success in
learning with cross-modal supervision from extensive amounts of image--text pairs collected …

被引用次数：68 相关文章所有 3 个版本

[PDF] thecvf.com

Clippo: Image-and-language understanding from pixels only

M Tschannen, B Mustafa… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Multimodal models are becoming increasingly effective, in part due to unified components,
such as the Transformer architecture. However, multimodal models still often consist of many …

被引用次数：21 相关文章所有 6 个版本

Parameter-efficient transfer learning for medical visual question answering

J Liu, T Hu, Y Zhang, Y Feng, J Hao… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

The Contrastive Language-Image Pre-Training (CLIP) model, pretrained on large visual text
corpora, has demonstrated significant improvements in visual and linguistic tasks and has …

被引用次数：8 相关文章

[PDF] arxiv.org

Self-supervised vision-language pretraining for medial visual question answering

P Li, G Liu, L Tan, J Liao… - 2023 IEEE 20th …, 2023 - ieeexplore.ieee.org

Medical image visual question answering (VQA) is a task to answer clinical questions, given
a radiographic image, which is a challenging problem that requires a model to integrate both …

被引用次数：29 相关文章所有 3 个版本

[PDF] arxiv.org

Masked vision and language pre-training with unimodal and multimodal contrastive losses for medical visual question answering

P Li, G Liu, J He, Z Zhao, S Zhong - International Conference on Medical …, 2023 - Springer

Medical visual question answering (VQA) is a challenging task that requires answering
clinical questions of a given medical image, by taking consider of both visual and language …

被引用次数：20 相关文章所有 4 个版本

[PDF] thecvf.com

Reveal: Retrieval-augmented visual-language pre-training with multi-source multimodal knowledge memory

Z Hu, A Iscen, C Sun, Z Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com

In this paper, we propose an end-to-end Retrieval-Augmented Visual Language Model
(REVEAL) that learns to encode world knowledge into a large-scale memory, and to retrieve …

被引用次数：48 相关文章所有 7 个版本

[PDF] arxiv.org

Cross-modal self-attention with multi-task pre-training for medical visual question answering

H Gong, G Chen, S Liu, Y Yu, G Li - Proceedings of the 2021 …, 2021 - dl.acm.org

Due to the severe lack of labeled data, existing methods of medical visual question
answering usually rely on transfer learning to obtain effective image feature representation …

被引用次数：68 相关文章所有 7 个版本

[PDF] arxiv.org

Answer-me: Multi-task open-vocabulary visual question answering

AJ Piergiovanni, W Li, W Kuo, M Saffar… - arXiv preprint arXiv …, 2022 - arxiv.org

We present Answer-Me, a task-aware multi-task framework which unifies a variety of
question answering tasks, such as, visual question answering, visual entailment, visual …

被引用次数：16 相关文章所有 2 个版本

AMAM: an attention-based multimodal alignment model for medical visual question answering

H Pan, S He, K Zhang, B Qu, C Chen, K Shi - Knowledge-Based Systems, 2022 - Elsevier

Abstract Medical Visual Question Answering (VQA) is a multimodal task to answer clinical
questions about medical images. Existing methods have achieved good performance, but …

被引用次数：21 相关文章所有 2 个版本

高级搜索

QQ 群

Pubmedclip: How much does clip benefit visual question answering in the medical domain?

Does clip benefit visual question answering in the medical domain as much as it does in the general domain?

Clippo: Image-and-language understanding from pixels only

Parameter-efficient transfer learning for medical visual question answering

Self-supervised vision-language pretraining for medial visual question answering

Masked vision and language pre-training with unimodal and multimodal contrastive losses for medical visual question answering

Reveal: Retrieval-augmented visual-language pre-training with multi-source multimodal knowledge memory

Cross-modal self-attention with multi-task pre-training for medical visual question answering

Answer-me: Multi-task open-vocabulary visual question answering

AMAM: an attention-based multimodal alignment model for medical visual question answering

引用