Medical phrase grounding with region-phrase context contrastive alignment

S Bannur, K Bouzid, DC Castro, A Schwaighofer… - arXiv preprint arXiv …, 2024 - arxiv.org

Radiology reporting is a complex task requiring detailed medical image understanding and
precise language generation, for which generative multimodal models offer a promising …

被引用次数：19 相关文章所有 2 个版本

[PDF] arxiv.org

ChEX: Interactive localization and region description in chest X-rays

P Müller, G Kaissis, D Rueckert - European Conference on Computer …, 2025 - Springer

Report generation models offer fine-grained textual interpretations of medical images like
chest X-rays, yet they often lack interactivity (ie. the ability to steer the generation process …

被引用次数：3 相关文章所有 2 个版本

[HTML] sciencedirect.com

[HTML][HTML] A systematic evaluation of GPT-4V's multimodal capability for chest X-ray image analysis

Y Liu, Y Li, Z Wang, X Liang, L Liu, L Wang, L Cui, Z Tu… - Meta-Radiology, 2024 - Elsevier

This work evaluates GPT-4V's multimodal capability for medical image analysis, focusing on
three representative tasks radiology report generation, medical visual question answering …

被引用次数：5 相关文章

[PDF] medrxiv.org

A comprehensive study of GPT-4V's multimodal capabilities in medical imaging

Y Li, Y Liu, Z Wang, X Liang, L Liu, L Wang, L Cui, Z Tu… - medRxiv, 2023 - medrxiv.org

This paper presents a comprehensive evaluation of GPT-4V's capabilities across diverse
medical imaging tasks, including Radiology Report Generation, Medical Visual Question …

被引用次数：20 相关文章所有 4 个版本

[PDF] arxiv.org

MedRG: Medical Report Grounding with Multi-modal Large Language Model

K Zou, Y Bai, Z Chen, Y Zhou, Y Chen, K Ren… - arXiv preprint arXiv …, 2024 - arxiv.org

Medical Report Grounding is pivotal in identifying the most relevant regions in medical
images based on a given phrase query, a critical aspect in medical image analysis and …

被引用次数：4 相关文章所有 2 个版本

Automatic medical report generation combining contrastive learning and feature difference

C Lyu, C Qiu, K Han, S Li, VS Sheng, H Rong… - Knowledge-Based …, 2024 - Elsevier

The automatic medical report generation is a challenging task because it requires accurate
capture and description of abnormal regions, especially for those discrepancies between …

[PDF] arxiv.org

Towards Visual Grounding: A Survey

L Xiao, X Yang, X Lan, Y Wang, C Xu - arXiv preprint arXiv:2412.20206, 2024 - arxiv.org

Visual Grounding is also known as Referring Expression Comprehension and Phrase
Grounding. It involves localizing a natural number of specific regions within an image based …

Interactive Surgical Training in Neuroendoscopy: Real-Time Anatomical Feature Localization using Natural Language Expressions

NM Matasyoh, R Schmidt, RA Zeineldin… - IEEE Transactions …, 2024 - ieeexplore.ieee.org

Objective: This study addresses challenges in surgical education, particularly in
neuroendoscopy, where the demand for optimized workflow conflicts with the need for …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

Anatomically-Grounded Fact Checking of Automated Chest X-ray Reports

R Mahmood, KCL Wong, DM Reyes, N D'Souza… - arXiv preprint arXiv …, 2024 - arxiv.org

With the emergence of large-scale vision-language models, realistic radiology reports may
be generated using only medical images as input guided by simple prompts. However, their …

GaVA-CLIP: Refining Multimodal Representations with Clinical Knowledge and Numerical Parameters for Gait Video Analysis in Neurodegenerative Diseases

D Wang, K Yuan, C Bobenrieth, H Seo - 2024 - hal.science

We present GaVA-CLIP, a knowledge augmentation strategy for Gait Video Analysis,
designed to assess diagnostic groups and gait impairment. Based on the large-scale …

高级搜索

QQ 群