Recent works achieve excellent results in dual-pixel defo-cus deblurring task by using convolutional neural network (CNN), while the scarcity of data limits the exploration and attempt of vision transformer in this task. In this paper, we propose a dynamic multi-scale network, named DMT-Net, for dual-pixel images defocus deblurring. In DMTNet, the feature extraction module is composed of several vision transformer blocks, which uses its powerful feature extraction capability to obtain robust features. The reconstruction module is composed of several Dynamic Multi-scale Sub-reconstruction Module (DMSSRM). DMSSRM restores images by adaptively assigning weights to features from differ-ent scales according to the blur distribution and content in-formation of the input images. DMTNet combines the ad-vantages of transformer and CNN, in which the vision trans-former improves the performance ceiling of CNN, and the inductive bias of CNN enables transformer to extract more robust features without relying on a large amount of data. Experimental results on the popular benchmarks demonstrate that our DMTNet significantly outperforms state-of-the-art methods.