Voice activity detection in the wild: A data-driven approach using teacher-student training

H Dinkel, S Wang, X Xu, M Wu… - IEEE/ACM Transactions …, 2021 - ieeexplore.ieee.org
Voice activity detection is an essential pre-processing component for speech-related tasks
such as automatic speech recognition (ASR). Traditional supervised VAD systems obtain …

Spatial–Temporal Feature Network for Speech-Based Depression Recognition

Z Han, Y Shang, Z Shao, J Liu, G Guo… - … on Cognitive and …, 2023 - ieeexplore.ieee.org
Depression is a serious mental disorder that has received increased attention from society.
Due to the advantage of easy acquisition of speech, researchers have tried to propose …

A unified deep learning framework for short-duration speaker verification in adverse environments

Y Jung, Y Choi, H Lim, H Kim - IEEE Access, 2020 - ieeexplore.ieee.org
Speaker verification (SV) has recently attracted considerable research interest due to the
growing popularity of virtual assistants. At the same time, there is an increasing requirement …

Voice frequency synthesis using VAW-GAN based amplitude scaling for emotion transformation

HJ Kwon, MJ Kim, JW Baek… - KSII Transactions on …, 2022 - koreascience.kr
Mostly, artificial intelligence does not show any definite change in emotions. For this reason,
it is hard to demonstrate empathy in communication with humans. If frequency modification …

[PDF][PDF] A Lightweight Framework for Online Voice Activity Detection in the Wild.

X Xu, H Dinkel, M Wu, K Yu - Interspeech, 2021 - isca-archive.org
Voice activity detection (VAD) is an essential pre-processing component for speech-related
tasks such as automatic speech recognition (ASR). Traditional VAD systems require strong …

Cross-domain voice activity detection with self-supervised representations

S Alisamir, F Ringeval, F Portet - arXiv preprint arXiv:2209.11061, 2022 - arxiv.org
Voice Activity Detection (VAD) aims at detecting speech segments on an audio signal, which
is a necessary first step for many today's speech based applications. Current state-of-the-art …

AviPer: assisting visually impaired people to perceive the world with visual-tactile multimodal attention network

X Li, M Huang, Y Xu, Y Cao, Y Lu, P Wang… - CCF Transactions on …, 2022 - Springer
Unlike able-bodied persons, it is difficult for visually impaired people, especially those in the
educational age, to build a full perception of the world due to the lack of normal vision. The …

Enrollment-less training for personalized voice activity detection

N Makishima, M Ihori, T Tanaka, A Takashima… - arXiv preprint arXiv …, 2021 - arxiv.org
We present a novel personalized voice activity detection (PVAD) learning method that does
not require enrollment data during training. PVAD is a task to detect the speech segments of …

Sg-vad: Stochastic gates based speech activity detection

J Svirsky, O Lindenbaum - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
We propose a novel voice activity detection (VAD) model in a low-resource environment.
Our key idea is to model VAD as a denoising task and construct a network that is designed …

Speech activity detection based on multilingual speech recognition system

SS Sarfjoo, S Madikeri, P Motlicek - arXiv preprint arXiv:2010.12277, 2020 - arxiv.org
To better model the contextual information and increase the generalization ability of Speech
Activity Detection (SAD) system, this paper leverages a multi-lingual Automatic Speech …