Recent advances in the development of vision-language foundation models (FMs) give rise
to the possibility of performing automated CXR interpretation, which can assist physicians
with clinical decision-making and improve patient outcomes. However, developing FMs that
can accurately interpret CXRs is challenging due to the (1) limited availability of large-scale
vision-language datasets in the medical image domain,(2) lack of vision and language …