Convolutional neural networks (CNNs), especially Fully Convolutional Networks (FCNs) like
U-Net, have shown remarkable success in medical image segmentation tasks. However,
they have limitations in capturing global context and long-range relations, especially for
objects with significant variations in shape, scale, and texture. While transformers have
achieved state-of-the-art results in natural language processing and image recognition, they …