image and video understanding tasks using self-attention. However, due to the quadratic
computational and memory complexities of self-attention, these works either apply attention
only to low-resolution feature maps in later stages of a deep network or restrict the receptive
field of attention in each layer to a small local region. To overcome these limitations, this
work introduces a new global self-attention module, referred to as the GSA module, which is …