Sparse modeling of magnitude and phase-derived spectra for playing technique classification

L Su, HM Lin, YH Yang - IEEE/ACM Transactions on Audio …, 2014 - ieeexplore.ieee.org
L Su, HM Lin, YH Yang
IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014ieeexplore.ieee.org
Computational modeling of musical timbre is important for a variety of music information
retrieval applications. While considerable progress has been made to recognize musical
genres and instruments, relatively little attention has been paid to modeling playing
techniques, which affect timbre in more subtle ways. In this paper, we contribute to this area
of research by systematically evaluating various audio features and processing methods for
multi-class playing technique classification, considering up to nine distinct playing …
Computational modeling of musical timbre is important for a variety of music information retrieval applications. While considerable progress has been made to recognize musical genres and instruments, relatively little attention has been paid to modeling playing techniques, which affect timbre in more subtle ways. In this paper, we contribute to this area of research by systematically evaluating various audio features and processing methods for multi-class playing technique classification, considering up to nine distinct playing techniques of bowed string instruments. Specifically, a collection of 6,759 chamber-recorded single notes of four bowed string instruments and a collection of 33 real-world solo violin recordings are used in the evaluation. Our evaluation shows that using sparse features extracted from the magnitude spectra and phase derivatives including group delay function (GDF) and instantaneous frequency deviation (IFD) leads to significantly better performance than using a combination of state-of-the-art temporal, spectral, cepstral and harmonic feature descriptors. For playing technique classification of violin singe notes, the former approach attains 0.915 macro-average F-score under a tenfold cross validation setting, while the latter only attains 0.835. Moreover, sparse modeling of magnitude and phase-derived spectra also performs well for single-note joint instrument-technique classification (F-score 0.770) and for playing technique classification of real-world violin solos (F-score 0.547). We find that phase information is particularly important in discriminating playing techniques with subtle differences, such as playing with different bowing positions (i.e., normal, sul tasto, and sul ponticello). A systematic investigation of the effect of parameters such as window sizes, hop factors, window types for phase-derived features is also reported to provide more insights.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果