作者
Szymon Drgas, Tuomas Virtanen
发表日期
2021/11/1
期刊
Computer Speech & Language
卷号
70
页码范围
101223
出版商
Academic Press
简介
In this article, we propose a new method for joint cochannel speaker separation and recognition called adaptive-dictionary non-negative matrix deconvolution (DANMD). This method is an extension of non-negative matrix deconvolution (NMD) which models spectrogram matrix as a linear combination of dictionary elements (atoms). We propose a dictionary which is a linear combination of speaker-independent component and components representing speaker variability. The dictionary is parametric and all atoms depend on a small number of parameters. The speaker-independent component and components representing speaker variability are learned from recordings of tens or hundreds of speakers. We show that the proposed method can be applied to the single-channel speech separation task where two speakers of unknown identity are to be separated. In a scenario where the unknown speakers’ recordings …