Cxr-clip: Toward large scale chest x-ray language-image pre-training

Z Zhao, Y Liu, H Wu, M Wang, Y Li, S Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

Contrastive Language-Image Pre-training (CLIP), a simple yet effective pre-training
paradigm, successfully introduces text supervision to vision models. It has shown promising …

被引用次数：49 相关文章所有 2 个版本

[PDF] arxiv.org

A comprehensive survey of foundation models in medicine

W Khan, S Leem, KB See, JK Wong… - IEEE Reviews in …, 2025 - ieeexplore.ieee.org

Foundation models (FMs) are large-scale deeplearning models that are developed using
large datasets and self-supervised learning methods. These models serve as a base for …

被引用次数：3 相关文章

[PDF] arxiv.org

Improving medical multi-modal contrastive learning with expert annotations

Y Kumar, P Marttinen - European Conference on Computer Vision, 2024 - Springer

We introduce eCLIP, an enhanced version of the CLIP model that integrates expert
annotations in the form of radiologist eye-gaze heatmaps. It tackles key challenges in …

被引用次数：11 相关文章所有 2 个版本

[PDF] thecvf.com

Fairclip: Harnessing fairness in vision-language learning

Y Luo, M Shi, MO Khan, MM Afzal… - Proceedings of the …, 2024 - openaccess.thecvf.com

Fairness is a critical concern in deep learning especially in healthcare where these models
influence diagnoses and treatment decisions. Although fairness has been investigated in the …

被引用次数：23 相关文章所有 3 个版本

[PDF] arxiv.org

Fact-aware multimodal retrieval augmentation for accurate medical radiology report generation

L Sun, J Zhao, M Han, C Xiong - arXiv preprint arXiv:2407.15268, 2024 - arxiv.org

Multimodal foundation models hold significant potential for automating radiology report
generation, thereby assisting clinicians in diagnosing cardiac diseases. However, generated …

被引用次数：5 相关文章所有 3 个版本

Core-Periphery Multi-Modality Feature Alignment for Zero-Shot Medical Image Analysis

X Yu, L Zhang, Z Wu, D Zhu - IEEE Transactions on Medical …, 2024 - ieeexplore.ieee.org

Multi-modality learning, exemplified by the language-image pair pre-trained CLIP model,
has demonstrated remarkable performance in enhancing zero-shot capabilities and has …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

Medical vision language pretraining: A survey

P Shrestha, S Amgain, B Khanal, CA Linte… - arXiv preprint arXiv …, 2023 - arxiv.org

Medical Vision Language Pretraining (VLP) has recently emerged as a promising solution to
the scarcity of labeled data in the medical domain. By leveraging paired/unpaired vision and …

被引用次数：14 相关文章所有 3 个版本

[PDF] nowpublishers.com

[PDF][PDF] Automatic Medical Report Generation: Methods and Applications

L Guo, AM Tahir, D Zhang, ZJ Wang… - … Transactions on Signal …, 2024 - nowpublishers.com

The increasing demand for medical imaging has surpassed the capacity of available
radiologists, leading to diagnostic delays and potential misdiagnoses. Artificial intelligence …

被引用次数：1 相关文章所有 4 个版本

[PDF] arxiv.org

Foundation model for advancing healthcare: Challenges, opportunities, and future directions

Y He, F Huang, X Jiang, Y Nie, M Wang, J Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

Foundation model, which is pre-trained on broad data and is able to adapt to a wide range
of tasks, is advancing healthcare. It promotes the development of healthcare artificial …

被引用次数：22 相关文章所有 2 个版本

[PDF] thecvf.com

PairAug: What Can Augmented Image-Text Pairs Do for Radiology?

Y Xie, Q Chen, S Wang, MS To, I Lee… - Proceedings of the …, 2024 - openaccess.thecvf.com

Current vision-language pre-training (VLP) methodologies predominantly depend on paired
image-text datasets a resource that is challenging to acquire in radiology due to privacy …

被引用次数：5 相关文章所有 3 个版本

高级搜索

QQ 群