MidiBERT-piano: large-scale pre-training for symbolic music understanding

YH Chou, I Chen, CJ Chang, J Ching… - arXiv preprint arXiv …, 2021 - arxiv.org
arXiv preprint arXiv:2107.05223, 2021arxiv.org
This paper presents an attempt to employ the mask language modeling approach of BERT
to pre-train a 12-layer Transformer model over 4,166 pieces of polyphonic piano MIDI files
for tackling a number of symbolic-domain discriminative music understanding tasks. These
include two note-level classification tasks, ie, melody extraction and velocity prediction, as
well as two sequence-level classification tasks, ie, composer classification and emotion
classification. We find that, given a pre-trained Transformer, our models outperform recurrent …
This paper presents an attempt to employ the mask language modeling approach of BERT to pre-train a 12-layer Transformer model over 4,166 pieces of polyphonic piano MIDI files for tackling a number of symbolic-domain discriminative music understanding tasks. These include two note-level classification tasks, i.e., melody extraction and velocity prediction, as well as two sequence-level classification tasks, i.e., composer classification and emotion classification. We find that, given a pre-trained Transformer, our models outperform recurrent neural network based baselines with less than 10 epochs of fine-tuning. Ablation studies show that the pre-training remains effective even if none of the MIDI data of the downstream tasks are seen at the pre-training stage, and that freezing the self-attention layers of the Transformer at the fine-tuning stage slightly degrades performance. All the five datasets employed in this work are publicly available, as well as checkpoints of our pre-trained and fine-tuned models. As such, our research can be taken as a benchmark for symbolic-domain music understanding.
arxiv.org
以上显示的是最相近的搜索结果。 查看全部搜索结果