A review of speaker diarization: Recent advances with deep learning

TJ Park, N Kanda, D Dimitriadis, KJ Han… - Computer Speech & …, 2022 - Elsevier
Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …

Speaker recognition by machines and humans: A tutorial review

JHL Hansen, T Hasan - IEEE Signal processing magazine, 2015 - ieeexplore.ieee.org
Identifying a person by his or her voice is an important human trait most take for granted in
natural human-to-human interaction/communication. Speaking to someone over the …

Voxceleb2: Deep speaker recognition

JS Chung, A Nagrani, A Zisserman - arXiv preprint arXiv:1806.05622, 2018 - arxiv.org
The objective of this paper is speaker recognition under noisy and unconstrained conditions.
We make two key contributions. First, we introduce a very large-scale audio-visual speaker …

Deep speaker: an end-to-end neural speaker embedding system

C Li, X Ma, B Jiang, X Li, X Zhang, X Liu, Y Cao… - arXiv preprint arXiv …, 2017 - arxiv.org
We present Deep Speaker, a neural speaker embedding system that maps utterances to a
hypersphere where speaker similarity is measured by cosine similarity. The embeddings …

[PDF][PDF] Analysis of i-vector length normalization in speaker recognition systems.

D Garcia-Romero, CY Espy-Wilson - Interspeech, 2011 - isca-archive.org
We present a method to boost the performance of probabilistic generative models that work
with i-vector representations. The proposed approach deals with the non-Gaussian behavior …

Multi-modal emotion recognition using EEG and speech signals

Q Wang, M Wang, Y Yang, X Zhang - Computers in Biology and Medicine, 2022 - Elsevier
Abstract Automatic Emotion Recognition (AER) is critical for naturalistic Human–Machine
Interactions (HMI). Emotions can be detected through both external behaviors, eg, tone of …

Unsupervised domain adaptation via domain adversarial training for speaker recognition

Q Wang, W Rao, S Sun, L Xie… - 2018 IEEE international …, 2018 - ieeexplore.ieee.org
The i-vector approach to speaker recognition has achieved good performance when the
domain of the evaluation dataset is similar to that of the training dataset. However, in …

Deep feature for text-dependent speaker verification

Y Liu, Y Qian, N Chen, T Fu, Y Zhang, K Yu - Speech Communication, 2015 - Elsevier
Recently deep learning has been successfully used in speech recognition, however it has
not been carefully explored and widely accepted for speaker verification. To incorporate …

PLDA for speaker verification with utterances of arbitrary duration

P Kenny, T Stafylakis, P Ouellet… - … , Speech and Signal …, 2013 - ieeexplore.ieee.org
The duration of speech segments has traditionally been controlled in the NIST speaker
recognition evaluations so that researchers working in this framework have been relieved of …

Voice in ear: Spoofing-resistant and passphrase-independent body sound authentication

Y Gao, Y Jin, J Chauhan, S Choi, J Li… - Proceedings of the ACM on …, 2021 - dl.acm.org
With the rapid growth of wearable computing and increasing demand for mobile
authentication scenarios, voiceprint-based authentication has become one of the prevalent …