Speaker recognition based on deep learning: An overview

Z Bai, XL Zhang - Neural Networks, 2021 - Elsevier
Speaker recognition is a task of identifying persons from their voices. Recently, deep
learning has dramatically revolutionized speaker recognition. However, there is lack of …

Overview of speaker modeling and its applications: From the lens of deep speaker representation learning

S Wang, Z Chen, KA Lee, Y Qian… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Speaker individuality information is among the most critical elements within speech signals.
By thoroughly and accurately modeling this information, it can be utilized in various …

MFA: TDNN with multi-scale frequency-channel attention for text-independent speaker verification with short utterances

T Liu, RK Das, KA Lee, H Li - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
The time delay neural network (TDNN) represents one of the state-of-the-art of neural
solutions to text-independent speaker verification. However, they require a large number of …

Whisper-SV: Adapting Whisper for low-data-resource speaker verification

L Zhang, N Jiang, Q Wang, Y Li, Q Lu, L Xie - Speech Communication, 2024 - Elsevier
Trained on 680,000 h of massive speech data, Whisper is a multitasking, multilingual
speech foundation model demonstrating superior performance in automatic speech …

Personalized adversarial data augmentation for dysarthric and elderly speech recognition

Z Jin, M Geng, J Deng, T Wang, S Hu… - … /ACM Transactions on …, 2023 - ieeexplore.ieee.org
Despite the rapid progress of automatic speech recognition (ASR) technologies targeting
normal speech, accurate recognition of dysarthric and elderly speech remains a highly …

End-to-end speaker identification research based on multi-scale SincNet and CGAN

G Wei, Y Zhang, H Min, Y Xu - Neural Computing and Applications, 2023 - Springer
Deep learning has improved the performance of speaker identification systems in recent
years, but it has also presented significant challenges. Typically, data-driven modeling …

Voiceextender: Short-Utterance Text-Independent Speaker Verification With Guided Diffusion Model

Y He, Z Kang, J Wang, J Peng… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
Speaker verification (SV) performance deteriorates as utterances become shorter. To this
end, we propose a new architecture called VoiceExtender which provides a promising …

FA-ExU-Net: the simultaneous training of an embedding extractor and enhancement model for a speaker verification system robust to short noisy utterances

J Kim, J Heo, H Shin, C Lim… - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org
Speaker verification (SV) technology has the potential to enhance personalization and
security in various applications, such as voice assistants, forensics, and access control …

Deep representation decomposition for rate-invariant speaker verification

F Tong, S Zheng, H Zhou, X Xie, Q Hong… - arXiv preprint arXiv …, 2022 - arxiv.org
While promising performance for speaker verification has been achieved by deep speaker
embeddings, the advantage would reduce in the case of speaking-style variability. Speaking …

Length-and noise-aware training techniques for short-utterance speaker recognition

W Chen, J Huang, T Bocklet - arXiv preprint arXiv:2008.12218, 2020 - arxiv.org
Speaker recognition performance has been greatly improved with the emergence of deep
learning. Deep neural networks show the capacity to effectively deal with impacts of noise …