in learning either global semantic information or local contextual information. To tackle these
issues, a novel network named SegTransVAE is proposed in this paper. SegTransVAE is
built upon encoder-decoder architecture, exploiting transformer with the variational
autoencoder (VAE) branch to the network to reconstruct the input images jointly with
segmentation. To the best of our knowledge, this is the first method combining the success of …