Frozen language model helps ecg zero-shot learning

J Li, C Liu, S Cheng, R Arcucci… - Medical Imaging with …, 2024 - proceedings.mlr.press
Medical Imaging with Deep Learning, 2024proceedings.mlr.press
The electrocardiogram (ECG) is one of the most commonly used non-invasive, convenient
medical monitoring tools that assist in the clinical diagnosis of heart diseases. Recently,
deep learning (DL) techniques, particularly self-supervised learning (SSL), have
demonstrated great potential in the classification of ECGs. SSL pre-training has achieved
competitive performance with only a small amount of annotated data after fine-tuning.
However, current SSL methods rely on the availability of annotated data and are unable to …
Abstract
The electrocardiogram (ECG) is one of the most commonly used non-invasive, convenient medical monitoring tools that assist in the clinical diagnosis of heart diseases. Recently, deep learning (DL) techniques, particularly self-supervised learning (SSL), have demonstrated great potential in the classification of ECGs. SSL pre-training has achieved competitive performance with only a small amount of annotated data after fine-tuning. However, current SSL methods rely on the availability of annotated data and are unable to predict labels not existing in fine-tuning datasets. To address this challenge, we propose\textbf {M} ultimodal\textbf {E} CG-\textbf {T} ext\textbf {S} elf-supervised pre-training (METS),\textbf {the first work} to utilize the auto-generated clinical reports to guide ECG SSL pre-training. We use a trainable ECG encoder and a frozen language model to embed paired ECGs and automatically machine-generated clinical reports separately, then the ECG embedding and paired report embedding are compared with other unpaired embeddings. In downstream classification tasks, METS achieves around 10% improvement in performance without using any annotated data via zero-shot classification, compared to other supervised and SSL baselines that rely on annotated data. Furthermore, METS achieves the highest recall and F1 scores on the MIT-BIH dataset, despite MIT-BIH containing different classes of ECGs compared to the pre-trained dataset. The extensive experiments have demonstrated the advantages of using ECG-Text multimodal self-supervised learning in terms of generalizability and effectiveness.
proceedings.mlr.press
以上显示的是最相近的搜索结果。 查看全部搜索结果