作者
Khalid Al-hammuri, Fayez Gebali, Awos Kanan
发表日期
2024/2/29
简介
Medical image segmentation is important for extracting desired objects among complex human structures to enable further analysis. In the case of lingual ultrasound, it is important to extract tongue contour to understand the language behaviour, which enables lingual ultrasound to act as a biofeedback. In order to segment tongue from ultrasound images, we need to train the deep-learning model on a large dataset, which made it challenging to generalize it using a wide variety of images as it is difficult to collect this huge data. In this research, we are proposing a strategy and generalized model that can work effectively using a well-managed small dataset. This article presents a hybrid architecture using UNet, Vision Transformer (ViT) and Contrastive loss to build a foundation model cumulatively. The process starts with building a reference representation in the embedding space using human experts to validate any …