Learning a multi-task transformer via unified and customized instruction tuning for chest radiograph interpretation

L Xu, Z Ni, X Liu, X Wang, H Li, S Zhang - arXiv preprint arXiv:2311.01092, 2023 - arxiv.org
L Xu, Z Ni, X Liu, X Wang, H Li, S Zhang
arXiv preprint arXiv:2311.01092, 2023arxiv.org
The emergence of multi-modal deep learning models has made significant impacts on
clinical applications in the last decade. However, the majority of models are limited to single-
tasking, without considering disease diagnosis is indeed a multi-task procedure. Here, we
demonstrate a unified transformer model specifically designed for multi-modal clinical tasks
by incorporating customized instruction tuning. We first compose a multi-task training dataset
comprising 13.4 million instruction and ground-truth pairs (with approximately one million …
The emergence of multi-modal deep learning models has made significant impacts on clinical applications in the last decade. However, the majority of models are limited to single-tasking, without considering disease diagnosis is indeed a multi-task procedure. Here, we demonstrate a unified transformer model specifically designed for multi-modal clinical tasks by incorporating customized instruction tuning. We first compose a multi-task training dataset comprising 13.4 million instruction and ground-truth pairs (with approximately one million radiographs) for the customized tuning, involving both image- and pixel-level tasks. Thus, we can unify the various vision-intensive tasks in a single training framework with homogeneous model inputs and outputs to increase clinical interpretability in one reading. Finally, we demonstrate the overall superior performance of our model compared to prior arts on various chest X-ray benchmarks across multi-tasks in both direct inference and finetuning settings. Three radiologists further evaluate the generated reports against the recorded ones, which also exhibit the enhanced explainability of our multi-task model.
arxiv.org
以上显示的是最相近的搜索结果。 查看全部搜索结果