Data augmentation for deep neural network acoustic modeling

X Cui, V Goel, B Kingsbury - IEEE/ACM Transactions on Audio …, 2015 - ieeexplore.ieee.org
This paper investigates data augmentation for deep neural network acoustic modeling
based on label-preserving transformations to deal with data sparsity. Two data …

Parp: Prune, adjust and re-prune for self-supervised speech recognition

CIJ Lai, Y Zhang, AH Liu, S Chang… - Advances in …, 2021 - proceedings.neurips.cc
Self-supervised speech representation learning (speech SSL) has demonstrated the benefit
of scale in learning rich representations for Automatic Speech Recognition (ASR) with …

Frontier Research on Low-Resource Speech Recognition Technology

W Slam, Y Li, N Urouvas - Sensors, 2023 - mdpi.com
With the development of continuous speech recognition technology, users have put forward
higher requirements in terms of speech recognition accuracy. Low-resource speech …

Maxout neurons for deep convolutional and LSTM neural networks in speech recognition

M Cai, J Liu - Speech Communication, 2016 - Elsevier
Deep neural networks (DNNs) have achieved great success in acoustic modeling for speech
recognition. However, DNNs with sigmoid neurons may suffer from the vanishing gradient …

Spoken content retrieval—beyond cascading speech recognition with text retrieval

L Lee, J Glass, H Lee, C Chan - IEEE/ACM Transactions on …, 2015 - ieeexplore.ieee.org
Spoken content retrieval refers to directly indexing and retrieving spoken content based on
the audio rather than text descriptions. This potentially eliminates the requirement of …

Automatic gain control and multi-style training for robust small-footprint keyword spotting with deep neural networks

R Prabhavalkar, R Alvarez, C Parada… - … , Speech and Signal …, 2015 - ieeexplore.ieee.org
We explore techniques to improve the robustness of small-footprint keyword spotting models
based on deep neural networks (DNNs) in the presence of background noise and in far-field …

Deep maxout networks for low-resource speech recognition

Y Miao, F Metze, S Rawat - 2013 IEEE Workshop on Automatic …, 2013 - ieeexplore.ieee.org
As a feed-forward architecture, the recently proposed maxout networks integrate dropout
naturally and show state-of-the-art results on various computer vision datasets. This paper …

Automatic Identification of Emotional Information in Spanish TV Debates and Human–Machine Interactions

M de Velasco, R Justo, M Inés Torres - Applied Sciences, 2022 - mdpi.com
Automatic emotion detection is a very attractive field of research that can help build more
natural human–machine interaction systems. However, several issues arise when real …

Online automatic speech recognition with listen, attend and spell model

R Hsiao, D Can, T Ng, R Travadi… - IEEE Signal Processing …, 2020 - ieeexplore.ieee.org
The Listen, Attend and Spell (LAS) model and other attention-based automatic speech
recognition (ASR) models have known limitations when operated in a fully online mode. In …

A high-performance Cantonese keyword search system

B Kingsbury, J Cui, X Cui, MJF Gales… - … , Speech and Signal …, 2013 - ieeexplore.ieee.org
We present a system for keyword search on Cantonese conversational telephony audio,
collected for the IARPA Babel program, that achieves good performance by combining …