A Comprehensive Study of GPT-4V's Multimodal Capabilities in Medical Imaging

Y Li, Y Liu, Z Wang, X Liang, L Liu, L Wang, L Cui, Z Tu… - medRxiv, 2023 - medrxiv.org
This paper presents a comprehensive evaluation of GPT-4V's capabilities across diverse
medical imaging tasks, including Radiology Report Generation, Medical Visual Question …

A survey on advancements in image-text multimodal models: From general techniques to biomedical implementations

R Guo, J Wei, L Sun, B Yu, G Chang, D Liu… - Computers in Biology …, 2024 - Elsevier
With the significant advancements of Large Language Models (LLMs) in the field of Natural
Language Processing (NLP), the development of image-text multimodal models has …

Medical Vision Language Pretraining: A survey

P Shrestha, S Amgain, B Khanal, CA Linte… - arXiv preprint arXiv …, 2023 - arxiv.org
Medical Vision Language Pretraining (VLP) has recently emerged as a promising solution to
the scarcity of labeled data in the medical domain. By leveraging paired/unpaired vision and …

Enhancing medical vision-language contrastive learning via inter-matching relation modelling

M Li, M Meng, M Fulham, DD Feng, L Bi… - arXiv preprint arXiv …, 2024 - arxiv.org
Medical image representations can be learned through medical vision-language contrastive
learning (mVLCL) where medical imaging reports are used as weak supervision through …