A review of speaker diarization: Recent advances with deep learning

TJ Park, N Kanda, D Dimitriadis, KJ Han… - Computer Speech & …, 2022 - Elsevier
Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …

Ecapa-tdnn: Emphasized channel attention, propagation and aggregation in tdnn based speaker verification

B Desplanques, J Thienpondt, K Demuynck - arXiv preprint arXiv …, 2020 - arxiv.org
Current speaker verification techniques rely on a neural network to extract speaker
representations. The successful x-vector architecture is a Time Delay Neural Network …

Deep speaker embeddings for Speaker Verification: Review and experimental comparison

M Jakubec, R Jarina, E Lieskovska, P Kasak - Engineering Applications of …, 2024 - Elsevier
The construction of speaker-specific acoustic models for automatic speaker recognition is
almost exclusively based on deep neural network-based speaker embeddings. This work …

Wespeaker: A research and production oriented speaker embedding learning toolkit

H Wang, C Liang, S Wang, Z Chen… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Speaker modeling is essential for many related tasks, such as speaker recognition and
speaker diarization. The dominant modeling approach is fixed-dimensional vector …

Clova baseline system for the voxceleb speaker recognition challenge 2020

HS Heo, BJ Lee, J Huh, JS Chung - arXiv preprint arXiv:2009.14153, 2020 - arxiv.org
This report describes our submission to the VoxCeleb Speaker Recognition Challenge
(VoxSRC) at Interspeech 2020. We perform a careful analysis of speaker recognition models …

Voxsrc 2020: The second voxceleb speaker recognition challenge

A Nagrani, JS Chung, J Huh, A Brown, E Coto… - arXiv preprint arXiv …, 2020 - arxiv.org
We held the second installment of the VoxCeleb Speaker Recognition Challenge in
conjunction with Interspeech 2020. The goal of this challenge was to assess how well …

Integrating frequency translational invariance in tdnns and frequency positional information in 2d resnets to enhance speaker verification

J Thienpondt, B Desplanques, K Demuynck - arXiv preprint arXiv …, 2021 - arxiv.org
This paper describes the IDLab submission for the text-independent task of the Short-
duration Speaker Verification Challenge 2021 (SdSVC-21). This speaker verification …

Voxsrc 2021: The third voxceleb speaker recognition challenge

A Brown, J Huh, JS Chung, A Nagrani… - arXiv preprint arXiv …, 2022 - arxiv.org
The third instalment of the VoxCeleb Speaker Recognition Challenge was held in
conjunction with Interspeech 2021. The aim of this challenge was to assess how well current …

Voxsrc 2022: The fourth voxceleb speaker recognition challenge

J Huh, A Brown, J Jung, JS Chung, A Nagrani… - arXiv preprint arXiv …, 2023 - arxiv.org
This paper summarises the findings from the VoxCeleb Speaker Recognition Challenge
2022 (VoxSRC-22), which was held in conjunction with INTERSPEECH 2022. The goal of …

Advancing speaker embedding learning: Wespeaker toolkit for research and production

S Wang, Z Chen, B Han, H Wang, C Liang… - Speech …, 2024 - Elsevier
Speaker modeling plays a crucial role in various tasks, and fixed-dimensional vector
representations, known as speaker embeddings, are the predominant modeling approach …