Deep belief networks based voice activity detection

XL Zhang, J Wu - IEEE Transactions on Audio, Speech, and …, 2012 - ieeexplore.ieee.org
Fusing the advantages of multiple acoustic features is important for the robustness of voice
activity detection (VAD). Recently, the machine-learning-based VADs have shown a …

DeepMMSE: A deep learning approach to MMSE-based noise power spectral density estimation

Q Zhang, A Nicolson, M Wang… - … /ACM Transactions on …, 2020 - ieeexplore.ieee.org
An accurate noise power spectral density (PSD) tracker is an indispensable component of a
single-channel speech enhancement system. Bayesian-motivated minimum mean-square …

Boosting contextual information for deep neural network based voice activity detection

XL Zhang, DL Wang - IEEE/ACM Transactions on Audio …, 2015 - ieeexplore.ieee.org
Voice activity detection (VAD) is an important topic in audio signal processing. Contextual
information is important for improving the performance of VAD at low signal-to-noise ratios …

Features for voice activity detection: a comparative analysis

S Graf, T Herbig, M Buck, G Schmidt - EURASIP Journal on Advances in …, 2015 - Springer
In many speech signal processing applications, voice activity detection (VAD) plays an
essential role for separating an audio stream into time intervals that contain speech activity …

End-to-end active speaker detection

JL Alcázar, M Cordes, C Zhao, B Ghanem - European Conference on …, 2022 - Springer
Recent advances in the Active Speaker Detection (ASD) problem build upon a two-stage
process: feature extraction and spatio-temporal context aggregation. In this paper, we …

A study of voice activity detection techniques for NIST speaker recognition evaluations

MW Mak, HB Yu - Computer Speech & Language, 2014 - Elsevier
Since 2008, interview-style speech has become an important part of the NIST speaker
recognition evaluations (SREs). Unlike telephone speech, interview speech has lower …

Voice activity detection based on an unsupervised learning framework

D Ying, Y Yan, J Dang, FK Soong - IEEE Transactions on Audio …, 2011 - ieeexplore.ieee.org
How to construct models for speech/nonspeech discrimination is a crucial point for voice
activity detectors (VADs). Semi-supervised learning is the most popular way for model …

Enabling voice-accompanying hand-to-face gesture recognition with cross-device sensing

Z Li, C Liang, Y Wang, Y Qin, C Yu, Y Yan… - Proceedings of the …, 2023 - dl.acm.org
Gestures performed accompanying the voice are essential for voice interaction to convey
complementary semantics for interaction purposes such as wake-up state and input …

An end-to-end multimodal voice activity detection using wavenet encoder and residual networks

I Ariav, I Cohen - IEEE Journal of Selected Topics in Signal …, 2019 - ieeexplore.ieee.org
Recently, there has been growing use of deep neural networks in many modern speech-
based systems such as speaker recognition, speech enhancement, and emotion …

Reducing f0 frame error of f0 tracking algorithms under noisy conditions with an unvoiced/voiced classification frontend

W Chu, A Alwan - … on Acoustics, Speech and Signal Processing, 2009 - ieeexplore.ieee.org
In this paper, we propose an F0 Frame Error (FFE) metric which combines Gross Pitch Error
(GPE) and Voicing Decision Error (VDE) to objectively evaluate the performance of …