BiomedGPT: a unified and generalist biomedical generative pre-trained transformer for vision, language, and multimodal tasks

K Zhang, J Yu, Z Yan, Y Liu, E Adhikarla, S Fu… - arXiv preprint arXiv …, 2023 - arxiv.org
In this paper, we introduce a unified and generalist Biomedical Generative Pre-trained
Transformer (BiomedGPT) model, which leverages self-supervision on large and diverse …

A generalist vision–language foundation model for diverse biomedical tasks

K Zhang, R Zhou, E Adhikarla, Z Yan, Y Liu, J Yu… - Nature Medicine, 2024 - nature.com
Traditional biomedical artificial intelligence (AI) models, designed for specific tasks or
modalities, often exhibit limited flexibility in real-world deployment and struggle to utilize …

Chexagent: Towards a foundation model for chest x-ray interpretation

Z Chen, M Varma, JB Delbrouck, M Paschali… - arXiv preprint arXiv …, 2024 - arxiv.org
Chest X-rays (CXRs) are the most frequently performed imaging test in clinical practice.
Recent advances in the development of vision-language foundation models (FMs) give rise …

A Generalist Learner for Multifaceted Medical Image Interpretation

HY Zhou, S Adithan, JN Acosta, EJ Topol… - arXiv preprint arXiv …, 2024 - arxiv.org
Current medical artificial intelligence systems are often limited to narrow applications,
hindering their widespread adoption in clinical practice. To address this limitation, we …

Has Multimodal Learning Delivered Universal Intelligence in Healthcare? A Comprehensive Survey

Q Lin, Y Zhu, X Mei, L Huang, J Ma, K He… - arXiv preprint arXiv …, 2024 - arxiv.org
The rapid development of artificial intelligence has constantly reshaped the field of
intelligent healthcare and medicine. As a vital technology, multimodal learning has …

Interpretable medical image visual question answering via multi-modal relationship graph learning

X Hu, L Gu, K Kobayashi, L Liu, M Zhang… - Medical Image …, 2024 - Elsevier
Abstract Medical Visual Question Answering (VQA) is an important task in medical multi-
modal Large Language Models (LLMs), aiming to answer clinically relevant questions …

GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI

P Chen, J Ye, G Wang, Y Li, Z Deng, W Li, T Li… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Vision-Language Models (LVLMs) are capable of handling diverse data types such as
imaging, text, and physiological signals, and can be applied in various fields. In the medical …

Learning a multi-task transformer via unified and customized instruction tuning for chest radiograph interpretation

L Xu, Z Ni, X Liu, X Wang, H Li, S Zhang - arXiv preprint arXiv:2311.01092, 2023 - arxiv.org
The emergence of multi-modal deep learning models has made significant impacts on
clinical applications in the last decade. However, the majority of models are limited to single …

Generalizing Visual Question Answering from Synthetic to Human-Written Questions via a Chain of QA with a Large Language Model

T Kim, Y Cho, H Shin, Y Jo, D Shin - arXiv preprint arXiv:2401.06400, 2024 - arxiv.org
Visual question answering (VQA) is a task where an image is given, and a series of
questions are asked about the image. To build an efficient VQA algorithm, a large amount of …

IKIM at MEDIQA-M3G 2024: Multilingual Visual Question-Answering for Dermatology through VLM Fine-tuning and LLM Translations

M Bauer, A Dada, C Seibold… - Proceedings of the 6th …, 2024 - aclanthology.org
This paper presents our solution to the MEDIQA-M3G Challenge at NAACL-ClinicalNLP
2024. We participated in all three languages, ranking first in Chinese and Spanish and third …