poses a significant challenge in the field of artificial intelligence. Unfortunately current state-
of-the-art video generation methods primarily focusing on text-to-video generation tend to
produce video clips with minimal motions despite maintaining high fidelity. We argue that
relying solely on text instructions is insufficient and suboptimal for video generation. In this
paper we introduce PixelDance a novel approach based on diffusion models that …