Maira-2: Grounded radiology report generation

S Bannur, K Bouzid, DC Castro, A Schwaighofer… - arXiv preprint arXiv …, 2024 - arxiv.org
Radiology reporting is a complex task requiring detailed medical image understanding and
precise language generation, for which generative multimodal models offer a promising …

ChEX: Interactive localization and region description in chest X-rays

P Müller, G Kaissis, D Rueckert - European Conference on Computer …, 2025 - Springer
Report generation models offer fine-grained textual interpretations of medical images like
chest X-rays, yet they often lack interactivity (ie. the ability to steer the generation process …

[HTML][HTML] A systematic evaluation of GPT-4V's multimodal capability for chest X-ray image analysis

Y Liu, Y Li, Z Wang, X Liang, L Liu, L Wang, L Cui, Z Tu… - Meta-Radiology, 2024 - Elsevier
This work evaluates GPT-4V's multimodal capability for medical image analysis, focusing on
three representative tasks radiology report generation, medical visual question answering …

A comprehensive study of GPT-4V's multimodal capabilities in medical imaging

Y Li, Y Liu, Z Wang, X Liang, L Liu, L Wang, L Cui, Z Tu… - medRxiv, 2023 - medrxiv.org
This paper presents a comprehensive evaluation of GPT-4V's capabilities across diverse
medical imaging tasks, including Radiology Report Generation, Medical Visual Question …

MedRG: Medical Report Grounding with Multi-modal Large Language Model

K Zou, Y Bai, Z Chen, Y Zhou, Y Chen, K Ren… - arXiv preprint arXiv …, 2024 - arxiv.org
Medical Report Grounding is pivotal in identifying the most relevant regions in medical
images based on a given phrase query, a critical aspect in medical image analysis and …

Automatic medical report generation combining contrastive learning and feature difference

C Lyu, C Qiu, K Han, S Li, VS Sheng, H Rong… - Knowledge-Based …, 2024 - Elsevier
The automatic medical report generation is a challenging task because it requires accurate
capture and description of abnormal regions, especially for those discrepancies between …

Towards Visual Grounding: A Survey

L Xiao, X Yang, X Lan, Y Wang, C Xu - arXiv preprint arXiv:2412.20206, 2024 - arxiv.org
Visual Grounding is also known as Referring Expression Comprehension and Phrase
Grounding. It involves localizing a natural number of specific regions within an image based …

Interactive Surgical Training in Neuroendoscopy: Real-Time Anatomical Feature Localization using Natural Language Expressions

NM Matasyoh, R Schmidt, RA Zeineldin… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
Objective: This study addresses challenges in surgical education, particularly in
neuroendoscopy, where the demand for optimized workflow conflicts with the need for …

Anatomically-Grounded Fact Checking of Automated Chest X-ray Reports

R Mahmood, KCL Wong, DM Reyes, N D'Souza… - arXiv preprint arXiv …, 2024 - arxiv.org
With the emergence of large-scale vision-language models, realistic radiology reports may
be generated using only medical images as input guided by simple prompts. However, their …

GaVA-CLIP: Refining Multimodal Representations with Clinical Knowledge and Numerical Parameters for Gait Video Analysis in Neurodegenerative Diseases

D Wang, K Yuan, C Bobenrieth, H Seo - 2024 - hal.science
We present GaVA-CLIP, a knowledge augmentation strategy for Gait Video Analysis,
designed to assess diagnostic groups and gait impairment. Based on the large-scale …