Clip in medical imaging: A comprehensive survey

Z Zhao, Y Liu, H Wu, M Wang, Y Li, S Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
Contrastive Language-Image Pre-training (CLIP), a simple yet effective pre-training
paradigm, successfully introduces text supervision to vision models. It has shown promising …

Llava-med: Training a large language-and-vision assistant for biomedicine in one day

C Li, C Wong, S Zhang, N Usuyama… - Advances in …, 2024 - proceedings.neurips.cc
Conversational generative AI has demonstrated remarkable promise for empowering
biomedical practitioners, but current investigations focus on unimodal text. Multimodal …

Evaluation and mitigation of the limitations of large language models in clinical decision-making

P Hager, F Jungmann, R Holland, K Bhagat… - Nature medicine, 2024 - nature.com
Clinical decision-making is one of the most impactful parts of a physician's responsibilities
and stands to benefit greatly from artificial intelligence solutions and large language models …

Towards generalist biomedical AI

T Tu, S Azizi, D Driess, M Schaekermann, M Amin… - NEJM AI, 2024 - ai.nejm.org
Background Medicine is inherently multimodal, requiring the simultaneous interpretation
and integration of insights between many data modalities spanning text, imaging, genomics …

A generalist vision–language foundation model for diverse biomedical tasks

K Zhang, R Zhou, E Adhikarla, Z Yan, Y Liu, J Yu… - Nature Medicine, 2024 - nature.com
Traditional biomedical artificial intelligence (AI) models, designed for specific tasks or
modalities, often exhibit limited flexibility in real-world deployment and struggle to utilize …

Prompt engineering for healthcare: Methodologies and applications

J Wang, E Shi, S Yu, Z Wu, C Ma, H Dai, Q Yang… - arXiv preprint arXiv …, 2023 - arxiv.org
Prompt engineering is a critical technique in the field of natural language processing that
involves designing and optimizing the prompts used to input information into models, aiming …

Biomedgpt: A unified and generalist biomedical generative pre-trained transformer for vision, language, and multimodal tasks

K Zhang, J Yu, E Adhikarla, R Zhou, Z Yan… - arXiv e …, 2023 - ui.adsabs.harvard.edu
Conventional task-and modality-specific artificial intelligence (AI) models are inflexible in
real-world deployment and maintenance for biomedicine. At the same time, the growing …

Medblip: Bootstrapping language-image pre-training from 3d medical images and texts

Q Chen, Y Hong - Proceedings of the Asian Conference on …, 2024 - openaccess.thecvf.com
Vision language pretraining (VLP) models have proven effective in numerous computer
vision applications. In this paper, we focus on developing a VLP model for the medical …

Foundation model for advancing healthcare: Challenges, opportunities, and future directions

Y He, F Huang, X Jiang, Y Nie, M Wang, J Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Foundation model, which is pre-trained on broad data and is able to adapt to a wide range
of tasks, is advancing healthcare. It promotes the development of healthcare artificial …

Medtrinity-25m: A large-scale multimodal dataset with multigranular annotations for medicine

Y Xie, C Zhou, L Gao, J Wu, X Li, HY Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org
This paper introduces MedTrinity-25M, a comprehensive, large-scale multimodal dataset for
medicine, covering over 25 million images across 10 modalities, with multigranular …