labels (eg piano, guitar...) to notes from unlabeled mixtures which contain only performance
information. To address the problem, we adopt diffusion models and explicitly guide them to
preserve consistency between mixtures and music. The quantitative results show that our
proposed model can generate high-fidelity samples for multitrack symbolic music with
creativity.