Convolutional neural networks (CNNs) have attracted considerable interest in seismic interpolation, in these networks, convolution operators are adopted to extract the features of seismic data, and the interpolation network is guided to learn the mapping between the corrupted data and their labels. However, the trained network only captures the interrelationship between data localities due to the local receptive field limitation of the convolution kernel, limiting the accuracy of interpolation. The Transformer uses a self-attention mechanism and has performed well in multiple areas. Motivated by this, we propose a multi-scale Transformer (MST) to restore incomplete seismic data. Based on the self-attention mechanism, the Transformer module calculates multiple groups of self-attention for multi-scale feature maps to capture long-range dependencies; it can recover the detailed information of missing data with higher accuracy. Synthetic and field seismic data interpolation experiments verified the performance of the proposed reconstruction method.