Speaker recognition based on deep learning: An overview

Z Bai, XL Zhang - Neural Networks, 2021 - Elsevier
Speaker recognition is a task of identifying persons from their voices. Recently, deep
learning has dramatically revolutionized speaker recognition. However, there is lack of …

Speaker identification features extraction methods: A systematic review

SS Tirumala, SR Shahamiri, AS Garhwal… - Expert Systems with …, 2017 - Elsevier
Speaker Identification (SI) is the process of identifying the speaker from a given utterance by
comparing the voice biometrics of the utterance with those utterance models stored …

Conv-tasnet: Surpassing ideal time–frequency magnitude masking for speech separation

Y Luo, N Mesgarani - IEEE/ACM transactions on audio, speech …, 2019 - ieeexplore.ieee.org
Single-channel, speaker-independent speech separation methods have recently seen great
progress. However, the accuracy, latency, and computational cost of such methods remain …

Speaker recognition from raw waveform with sincnet

M Ravanelli, Y Bengio - 2018 IEEE spoken language …, 2018 - ieeexplore.ieee.org
Deep learning is progressively gaining popularity as a viable alternative to i-vectors for
speaker recognition. Promising results have been recently obtained with Convolutional …

X-vectors: Robust dnn embeddings for speaker recognition

D Snyder, D Garcia-Romero, G Sell… - … on acoustics, speech …, 2018 - ieeexplore.ieee.org
In this paper, we use data augmentation to improve performance of deep neural network
(DNN) embeddings for speaker recognition. The DNN, which is trained to discriminate …

Attentive statistics pooling for deep speaker embedding

K Okabe, T Koshinaka, K Shinoda - arXiv preprint arXiv:1803.10963, 2018 - arxiv.org
This paper proposes attentive statistics pooling for deep speaker embedding in text-
independent speaker verification. In conventional speaker embedding, frame-level features …

Stargan-vc2: Rethinking conditional methods for stargan-based voice conversion

T Kaneko, H Kameoka, K Tanaka, N Hojo - arXiv preprint arXiv …, 2019 - arxiv.org
Non-parallel multi-domain voice conversion (VC) is a technique for learning mappings
among multiple domains without relying on parallel data. This is important but challenging …

Real-time, universal, and robust adversarial attacks against speaker recognition systems

Y Xie, C Shi, Z Li, J Liu, Y Chen… - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org
As the popularity of voice user interface (VUI) exploded in recent years, speaker recognition
system has emerged as an important medium of identifying a speaker in many security …

Probing the information encoded in x-vectors

D Raj, D Snyder, D Povey… - 2019 IEEE Automatic …, 2019 - ieeexplore.ieee.org
Deep neural network based speaker embeddings, such as x-vectors, have been shown to
perform well in text-independent speaker recognition/verification tasks. In this paper, we use …

Deep representation learning in speech processing: Challenges, recent advances, and future trends

S Latif, R Rana, S Khalifa, R Jurdak, J Qadir… - arXiv preprint arXiv …, 2020 - arxiv.org
Research on speech processing has traditionally considered the task of designing hand-
engineered acoustic features (feature engineering) as a separate distinct problem from the …