Fundamentals, present and future perspectives of speech enhancement

N Das, S Chakraborty, J Chaki, N Padhy… - International Journal of …, 2021 - Springer
Speech enhancement has substantial interest in the utilization of speaker identification,
video-conference, speech transmission through communication channels, speech-based …

Power-normalized cepstral coefficients (PNCC) for robust speech recognition

C Kim, RM Stern - IEEE/ACM Transactions on audio, speech …, 2016 - ieeexplore.ieee.org
This paper presents a new feature extraction algorithm called power normalized Cepstral
coefficients (PNCC) that is motivated by auditory processing. Major new features of PNCC …

Speaker recognition from whispered speech: A tutorial survey and an application of time-varying linear prediction

V Vestman, D Gowda, M Sahidullah, P Alku… - Speech …, 2018 - Elsevier
From the available biometric technologies, automatic speaker recognition is one of the most
convenient and accessible ones due to abundance of mobile devices equipped with a …

Efficient spoken term discovery using randomized algorithms

A Jansen, B Van Durme - 2011 IEEE Workshop on Automatic …, 2011 - ieeexplore.ieee.org
Spoken term discovery is the task of automatically identifying words and phrases in speech
data by searching for long repeated acoustic patterns. Initial solutions relied on exhaustive …

Linear versus mel frequency cepstral coefficients for speaker recognition

X Zhou, D Garcia-Romero… - 2011 IEEE workshop …, 2011 - ieeexplore.ieee.org
Mel-frequency cepstral coefficients (MFCC) have been dominantly used in speaker
recognition as well as in speech recognition. However, based on theories in speech …

[PDF][PDF] Rapid evaluation of speech representations for spoken term discovery

MA Carlin, S Thomas, A Jansen… - … Annual Conference of …, 2011 - academia.edu
Acoustic front-ends are typically developed for supervised learning tasks and are thus
optimized to minimize word error rate, phone error rate, etc. However, in recent efforts to …

[HTML][HTML] Environmentally robust ASR front-end for deep neural network acoustic models

T Yoshioka, MJF Gales - Computer Speech & Language, 2015 - Elsevier
This paper examines the individual and combined impacts of various front-end approaches
on the performance of deep neural network (DNN) based speech recognition systems in …

[PDF][PDF] Robust language identification using convolutional neural network features.

S Ganapathy, KJ Han, S Thomas, MK Omar… - Interspeech, 2014 - isca-archive.org
The language identification (LID) task in the Robust Automatic Transcription of Speech
(RATS) program is challenging due to the noisy nature of the audio data collected over …

Query-by-example spoken term detection using frequency domain linear prediction and non-segmental dynamic time warping

G Mantena, S Achanta… - IEEE/ACM Transactions on …, 2014 - ieeexplore.ieee.org
The task of query-by-example spoken term detection (QbE-STD) is to find a spoken query
within spoken audio data. Current state-of-the-art techniques assume zero prior knowledge …

Speech dereverberation with frequency domain autoregressive modeling

A Purushothaman, D Dutta, R Kumar… - … /ACM Transactions on …, 2023 - ieeexplore.ieee.org
Speech applications in far-field real world settings often deal with signals that are corrupted
by reverberation. The task of dereverberation constitutes an important step to improve the …