Overview of speaker modeling and its applications: From the lens of deep speaker representation learning

S Wang, Z Chen, KA Lee, Y Qian… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Speaker individuality information is among the most critical elements within speech signals.
By thoroughly and accurately modeling this information, it can be utilized in various …

ASVspoof 5: Crowdsourced speech data, deepfakes, and adversarial attacks at scale

X Wang, H Delgado, H Tak, J Jung, H Shim… - arXiv preprint arXiv …, 2024 - arxiv.org
ASVspoof 5 is the fifth edition in a series of challenges that promote the study of speech
spoofing and deepfake attacks, and the design of detection solutions. Compared to previous …

Whisper-SV: Adapting Whisper for low-data-resource speaker verification

L Zhang, N Jiang, Q Wang, Y Li, Q Lu, L Xie - Speech Communication, 2024 - Elsevier
Trained on 680,000 h of massive speech data, Whisper is a multitasking, multilingual
speech foundation model demonstrating superior performance in automatic speech …

An enhanced res2net with local and global feature fusion for speaker verification

Y Chen, S Zheng, H Wang, L Cheng, Q Chen… - arXiv preprint arXiv …, 2023 - arxiv.org
Effective fusion of multi-scale features is crucial for improving speaker verification
performance. While most existing methods aggregate multi-scale features in a layer-wise …

Leveraging asr pretrained conformers for speaker verification through transfer learning and knowledge distillation

D Cai, M Li - IEEE/ACM Transactions on Audio, Speech, and …, 2024 - ieeexplore.ieee.org
This paper focuses on the application of Conformers in speaker verification. Conformers,
initially designed for Automatic Speech Recognition (ASR), excel at modeling both local and …

Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification

T Liu, KA Lee, Q Wang, H Li - IEEE/ACM Transactions on Audio …, 2024 - ieeexplore.ieee.org
The residual neural networks (ResNet) demonstrate the impressive performance in
automatic speaker verification (ASV). They treat the time and frequency dimensions equally …

Pretraining conformer with asr for speaker verification

D Cai, W Wang, M Li, R Xia… - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
This paper proposes to pretrain Conformer with automatic speech recognition (ASR) task for
speaker verification. Conformer combines convolution neural network (CNN) and …

t-EER: Parameter-free tandem evaluation of countermeasures and biometric comparators

TH Kinnunen, KA Lee, H Tak, N Evans… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Presentation attack (spoofing) detection (PAD) typically operates alongside biometric
verification to improve reliablity in the face of spoofing attacks. Even though the two sub …

ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models

J Jung, W Zhang, J Shi, Z Aldeneh, T Higuchi… - arXiv preprint arXiv …, 2024 - arxiv.org
This paper introduces ESPnet-SPK, a toolkit designed with several objectives for training
speaker embedding extractors. First, we provide an open-source platform for researchers in …

[PDF][PDF] LightVoc: An upsampling-free GAN vocoder based on Conformer and inverse short-time Fourier transform

DS Dang, TL Nguyen, BT Ta, TT Nguyen… - Proc …, 2023 - isca-archive.org
Most neural vocoders based on generative adversarial networks (GANs) rely on iterative
upsampling to generate audio sequences from mel-spectrograms as well as dilated …