Semantic segmentation of remote sensing images by interactive representation refinement and geometric prior-guided inference

X Li, F Xu, F Liu, Y Tong, X Lyu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
X Li, F Xu, F Liu, Y Tong, X Lyu, J Zhou
IEEE Transactions on Geoscience and Remote Sensing, 2023ieeexplore.ieee.org
High spatial resolution remote sensing images (HRRSIs) contain intricate details and varied
spectral distributions, making their semantic segmentation a challenging task. To address
this problem, it is crucial to adequately capture both local and global contexts to reduce
semantic ambiguity. While self-attention modules in vision transformers capture long-range
context, they tend to sacrifice local details. In this article, we propose a geometric prior-
guided interactive network (GPINet), a hybrid network that refines features across encoder …
High spatial resolution remote sensing images (HRRSIs) contain intricate details and varied spectral distributions, making their semantic segmentation a challenging task. To address this problem, it is crucial to adequately capture both local and global contexts to reduce semantic ambiguity. While self-attention modules in vision transformers capture long-range context, they tend to sacrifice local details. In this article, we propose a geometric prior-guided interactive network (GPINet), a hybrid network that refines features across encoder and decoder stages. First of all, a dual branch structure encoder with local-global interaction modules (LGIMs) is designed to fully exploit local and global contexts for feature refinement. Unlike commonly used skip connections or concatenations, the LGIMs bilaterally couple and exchange CNN features with transformer features by lossless transformation and elaborating cross-attention. Moreover, we introduce a geometric prior generation module (GPGM) that iteratively updates the randomly initialized geometric prior. Subsequently, the geometric priors are stored and used to guide feature recovery. Finally, a weighted summation is applied to the upsampled decoded features and geometric priors. By comprehensively capturing contexts and enabling lossless decoding and deterministic inference, GPINet allows the network to learn discriminative representations for accurately specifying pixel-level semantics. Experiments on three benchmark datasets demonstrate the superiority of the proposed GPINet over state-of-the-art methods. Furthermore, we validate the effectiveness of geometric priors and compare the model sizes.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果