Binaural classification-based speech segregation and robust speaker recognition system

R Venkatesan, A Balaji Ganesh - Circuits, Systems, and Signal Processing, 2018 - Springer
R Venkatesan, A Balaji Ganesh
Circuits, Systems, and Signal Processing, 2018Springer
The paper presents an auditory scene analyser that comprises of two joint simultaneous
modules, namely binaural speech segregation and speaker recognition. The binaural
speech segregation is realized by incorporating interaural time and level differences,
interaural phase difference and interaural coherence along with direct-to-reverberant ratio
into deep recurrent neural network. The performance of deep recurrent network-based
speech segregation is validated in terms of source to interference ratio, source to distortion …
Abstract
The paper presents an auditory scene analyser that comprises of two joint simultaneous modules, namely binaural speech segregation and speaker recognition. The binaural speech segregation is realized by incorporating interaural time and level differences, interaural phase difference and interaural coherence along with direct-to-reverberant ratio into deep recurrent neural network. The performance of deep recurrent network-based speech segregation is validated in terms of source to interference ratio, source to distortion ratio and source to artifacts ratio and compared with existing architectures including deep neural network. It is observed that performance of conventional deep recurrent neural network can be improved further by involving discriminative objectives along with soft time–frequency masking as a layer in the network structure. The system also proposes a spectro-temporal extractor which is referred as Gabor–Hilbert envelope coefficients (GHEC). The proposed monaural feature is responsible for extracting discriminative acoustic information from segregated speech sources. The performance of GHEC is validated under various noisy and reverberant environments and the results are compared with existing monaural features. The results of binaural speech segregation have shown better signal-to-noise ratio at an average of 0.7 dB even in the presence of higher reverberation time, 0.89 s over other baseline algorithms.
Springer
以上显示的是最相近的搜索结果。 查看全部搜索结果