[PDF][PDF] Convolutional Recurrent Neural Networks for Speech Activity Detection in Naturalistic Audio from Apollo Missions.

P Gimeno, D Ribas, AO Giménez, A Miguel… - …, 2021 - researchgate.net
IberSPEECH, 2021researchgate.net
Abstract Speech Activity Detection (SAD) aims to correctly distinguish audio segments
containing human speech. Several solutions have been successfully applied to the SAD
task, with deep learning approaches being specially relevant nowadays. This paper
describes a SAD solution based on Convolutional Recurrent Neural Networks (CRNN)
presented as the ViVoLab submission to the 2020 Fearless steps challenge. The dataset
used comes from the audio of Apollo space missions, presenting a challenging domain with …
Abstract
Speech Activity Detection (SAD) aims to correctly distinguish audio segments containing human speech. Several solutions have been successfully applied to the SAD task, with deep learning approaches being specially relevant nowadays. This paper describes a SAD solution based on Convolutional Recurrent Neural Networks (CRNN) presented as the ViVoLab submission to the 2020 Fearless steps challenge. The dataset used comes from the audio of Apollo space missions, presenting a challenging domain with strong degradation and several transmission noises. First, we explore the performance of 1D and 2D convolutional processing stages. Then we propose a novel architecture that executes the fusion of two convolutional feature maps by combining the information captured with 1D and 2D filters. Obtained results largely outperform the baseline provided by the organisation. They were able to achieve a detection cost function below 2% on the development set for all configurations. Best results were reported on the presented fusion architecture, with a DCF metric of 1.78% on the evaluation set and ranking fourth among all the participant teams in the challenge SAD task.
researchgate.net
以上显示的是最相近的搜索结果。 查看全部搜索结果