Speaker recognition based on deep learning: An overview

Z Bai, XL Zhang - Neural Networks, 2021 - Elsevier
Speaker recognition is a task of identifying persons from their voices. Recently, deep
learning has dramatically revolutionized speaker recognition. However, there is lack of …

Conditional generative adversarial networks for speech enhancement and noise-robust speaker verification

D Michelsanti, ZH Tan - arXiv preprint arXiv:1709.01703, 2017 - arxiv.org
Improving speech system performance in noisy environments remains a challenging task,
and speech enhancement (SE) is one of the effective techniques to solve the problem …

rVAD: An unsupervised segment-based robust voice activity detection method

ZH Tan, N Dehak - Computer speech & language, 2020 - Elsevier
This paper presents an unsupervised segment-based method for robust voice activity
detection (rVAD). The method consists of two passes of denoising followed by a voice …

Speech intelligibility potential of general and specialized deep neural network based speech enhancement systems

M Kolbæk, ZH Tan, J Jensen - IEEE/ACM Transactions on …, 2016 - ieeexplore.ieee.org
In this paper, we study aspects of single microphone speech enhancement (SE) based on
deep neural networks (DNNs). Specifically, we explore the generalizability capabilities of …

Hi-mia: A far-field text-dependent speaker verification database and the baselines

X Qin, H Bu, M Li - ICASSP 2020-2020 IEEE International …, 2020 - ieeexplore.ieee.org
This paper presents a far-field text-dependent speaker verification database named HI-MIA.
We aim to meet the data requirement for far-field microphone array based speaker …

Within-sample variability-invariant loss for robust speaker recognition under noisy environments

D Cai, W Cai, M Li - ICASSP 2020-2020 IEEE International …, 2020 - ieeexplore.ieee.org
Despite the significant improvements in speaker recognition enabled by deep neural
networks, unsatisfactory performance persists under noisy environments. In this paper, we …

Monaural speech enhancement using deep neural networks by maximizing a short-time objective intelligibility measure

M Kolbæk, ZH Tan, J Jensen - 2018 IEEE International …, 2018 - ieeexplore.ieee.org
In this paper we propose a Deep Neural Network (D NN) based Speech Enhancement (SE)
system that is designed to maximize an approximation of the Short-Time Objective …

[PDF][PDF] Far-Field End-to-End Text-Dependent Speaker Verification Based on Mixed Training Data with Transfer Learning and Enrollment Data Augmentation.

X Qin, D Cai, M Li - Interspeech, 2019 - isca-archive.org
In this paper, we focus on the far-field end-to-end textdependent speaker verification task
with a small-scale far-field text dependent dataset and a large scale close-talking text …

Deep-learning-based audio-visual speech enhancement in presence of Lombard effect

D Michelsanti, ZH Tan, S Sigurdsson, J Jensen - Speech Communication, 2019 - Elsevier
When speaking in presence of background noise, humans reflexively change their way of
speaking in order to improve the intelligibility of their speech. This reflex is known as …

Bone-conducted speech enhancement using deep denoising autoencoder

HP Liu, Y Tsao, CS Fuh - Speech Communication, 2018 - Elsevier
Bone-conduction microphones (BCMs) capture speech signals based on the vibrations of
the speaker's skull and exhibit better noise-resistance capabilities than normal air …