Self-supervised speech representation learning (speech SSL) has demonstrated the benefit of scale in learning rich representations for Automatic Speech Recognition (ASR) with …
W Slam, Y Li, N Urouvas - Sensors, 2023 - mdpi.com
With the development of continuous speech recognition technology, users have put forward higher requirements in terms of speech recognition accuracy. Low-resource speech …
M Cai, J Liu - Speech Communication, 2016 - Elsevier
Deep neural networks (DNNs) have achieved great success in acoustic modeling for speech recognition. However, DNNs with sigmoid neurons may suffer from the vanishing gradient …
L Lee, J Glass, H Lee, C Chan - IEEE/ACM Transactions on …, 2015 - ieeexplore.ieee.org
Spoken content retrieval refers to directly indexing and retrieving spoken content based on the audio rather than text descriptions. This potentially eliminates the requirement of …
We explore techniques to improve the robustness of small-footprint keyword spotting models based on deep neural networks (DNNs) in the presence of background noise and in far-field …
Y Miao, F Metze, S Rawat - 2013 IEEE Workshop on Automatic …, 2013 - ieeexplore.ieee.org
As a feed-forward architecture, the recently proposed maxout networks integrate dropout naturally and show state-of-the-art results on various computer vision datasets. This paper …
Automatic emotion detection is a very attractive field of research that can help build more natural human–machine interaction systems. However, several issues arise when real …
The Listen, Attend and Spell (LAS) model and other attention-based automatic speech recognition (ASR) models have known limitations when operated in a fully online mode. In …
We present a system for keyword search on Cantonese conversational telephony audio, collected for the IARPA Babel program, that achieves good performance by combining …