DeViDe: Faceted medical knowledge for improved medical vision-language pre-training

H Luo, Z Zhou, C Royer, A Sekuboyina… - arXiv preprint arXiv …, 2024 - arxiv.org
Vision-language pre-training for chest X-rays has made significant strides, primarily by
utilizing paired radiographs and radiology reports. However, existing approaches often face …

Imitate: Clinical prior guided hierarchical vision-language pre-training

C Liu, S Cheng, M Shi, A Shah, W Bai… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
In the field of medical Vision-Language Pretraining (VLP), significant efforts have been
devoted to deriving text and image features from both clinical reports and associated …

Xlip: Cross-modal attention masked modelling for medical language-image pre-training

B Wu, Y Xie, Z Zhang, MH Phan, Q Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
Vision-and-language pretraining (VLP) in the medical field utilizes contrastive learning on
image-text pairs to achieve effective transfer across tasks. Yet, current VLP approaches with …

Grounded Knowledge-Enhanced Medical VLP for Chest X-Ray

Q Deng, Z Huang, Y Wang, Z Wang, Z Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Medical vision-language pre-training has emerged as a promising approach for learning
domain-general representations of medical image and text. Current algorithms that exploit …

Medical image understanding with pretrained vision language models: A comprehensive study

Z Qin, H Yi, Q Lao, K Li - arXiv preprint arXiv:2209.15517, 2022 - arxiv.org
The large-scale pre-trained vision language models (VLM) have shown remarkable domain
transfer capability on natural images. However, it remains unknown whether this capability …

Knowledge-enhanced visual-language pre-training on chest radiology images

X Zhang, C Wu, Y Zhang, W Xie, Y Wang - Nature Communications, 2023 - nature.com
While multi-modal foundation models pre-trained on large-scale data have been successful
in natural language understanding and vision recognition, their use in medical domains is …

Freeze the Backbones: a Parameter-Efficient Contrastive Approach to Robust Medical Vision-Language Pre-Training

J Qin, C Liu, S Cheng, Y Guo… - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
Modern healthcare often utilises radiographic images alongside textual reports for
diagnostics, encouraging the use of Vision-Language Self-Supervised Learning (VL-SSL) …

Utilizing synthetic data for medical vision-language pre-training: Bypassing the need for real images

C Liu, A Shah, W Bai, R Arcucci - arXiv preprint arXiv:2310.07027, 2023 - arxiv.org
Medical Vision-Language Pre-training (VLP) learns representations jointly from medical
images and paired radiology reports. It typically requires large-scale paired image-text …

M-flag: Medical vision-language pre-training with frozen language models and latent space geometry optimization

C Liu, S Cheng, C Chen, M Qiao, W Zhang… - … Conference on Medical …, 2023 - Springer
Medical vision-language models enable co-learning and integrating features from medical
imaging and clinical text. However, these models are not easy to train and the latent …

MOSMOS: Multi-organ segmentation facilitated by medical report supervision

W Tian, X Huang, J Hou, C Ren, L Jiang… - arXiv preprint arXiv …, 2024 - arxiv.org
Owing to a large amount of multi-modal data in modern medical systems, such as medical
images and reports, Medical Vision-Language Pre-training (Med-VLP) has demonstrated …