Deep learning methods in speaker recognition: a review

D Sztahó, G Szaszák, A Beke - arXiv preprint arXiv:1911.06615, 2019 - arxiv.org
This paper summarizes the applied deep learning practices in the field of speaker
recognition, both verification and identification. Speaker recognition has been a widely used …

X-vectors: Robust dnn embeddings for speaker recognition

D Snyder, D Garcia-Romero, G Sell… - … on acoustics, speech …, 2018 - ieeexplore.ieee.org
In this paper, we use data augmentation to improve performance of deep neural network
(DNN) embeddings for speaker recognition. The DNN, which is trained to discriminate …

[PDF][PDF] Deep neural network embeddings for text-independent speaker verification.

D Snyder, D Garcia-Romero, D Povey, S Khudanpur - Interspeech, 2017 - isca-archive.org
This paper investigates replacing i-vectors for text-independent speaker verification with
embeddings extracted from a feedforward deep neural network. Long-term speaker …

Speaker recognition for multi-speaker conversations using x-vectors

D Snyder, D Garcia-Romero, G Sell… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
Recently, deep neural networks that map utterances to fixed-dimensional embeddings have
emerged as the state-of-the-art in speaker recognition. Our prior work introduced x-vectors …

Deep neural network-based speaker embeddings for end-to-end speaker verification

D Snyder, P Ghahremani, D Povey… - 2016 IEEE spoken …, 2016 - ieeexplore.ieee.org
In this study, we investigate an end-to-end text-independent speaker verification system. The
architecture consists of a deep neural network that takes a variable length speech segment …

Attentive temporal pooling for conformer-based streaming language identification in long-form speech

Q Wang, Y Yu, J Pelecanos, Y Huang… - arXiv preprint arXiv …, 2022 - arxiv.org
In this paper, we introduce a novel language identification system based on conformer
layers. We propose an attentive temporal pooling mechanism to allow the model to carry …

Memory storable network based feature aggregation for speaker representation learning

B Gu, W Guo, J Zhang - IEEE/ACM Transactions on Audio …, 2023 - ieeexplore.ieee.org
Learning fixed-dimensional speaker representation using deep neural networks is a key
step in speaker verification. In this work, we propose an auxiliary memory storable network …

A Dynamic Convolution Framework for Session-Independent Speaker Embedding Learning

B Gu, J Zhang, W Guo - IEEE/ACM Transactions on Audio …, 2023 - ieeexplore.ieee.org
Speaker verification (SV) has suffered from session variability in complex acoustic
scenarios, and learning session independent speaker representations remains a …

[PDF][PDF] X-Vectors: Robust neural embeddings for speaker recognition

D Snyder - 2020 - jscholarship.library.jhu.edu
Speaker recognition is the task of identifying speakers based on their speech signal.
Typically, this involves comparing speech from a known speaker, with recordings from …

A bayesian attention neural network layer for speaker recognition

W Zhu, J Pelecanos - ICASSP 2019-2019 IEEE International …, 2019 - ieeexplore.ieee.org
Neural network based attention modeling has found utility in areas such as visual analysis,
speech recognition and more recently speaker recognition. Attention represents a gating (or …