Speaker recognition based on deep learning: An overview

Z Bai, XL Zhang - Neural Networks, 2021 - Elsevier
Speaker recognition is a task of identifying persons from their voices. Recently, deep
learning has dramatically revolutionized speaker recognition. However, there is lack of …

A review of speaker diarization: Recent advances with deep learning

TJ Park, N Kanda, D Dimitriadis, KJ Han… - Computer Speech & …, 2022 - Elsevier
Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …

Bayesian hmm clustering of x-vector sequences (vbx) in speaker diarization: theory, implementation and analysis on standard tasks

F Landini, J Profant, M Diez, L Burget - Computer Speech & Language, 2022 - Elsevier
The recently proposed VBx diarization method uses a Bayesian hidden Markov model to
find speaker clusters in a sequence of x-vectors. In this work we perform an extensive …

End-to-end speaker diarization for an unknown number of speakers with encoder-decoder based attractors

S Horiguchi, Y Fujita, S Watanabe, Y Xue… - arXiv preprint arXiv …, 2020 - arxiv.org
End-to-end speaker diarization for an unknown number of speakers is addressed in this
paper. Recently proposed end-to-end speaker diarization outperformed conventional …

Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis

D Raj, P Denisov, Z Chen, H Erdogan… - 2021 IEEE spoken …, 2021 - ieeexplore.ieee.org
Multi-speaker speech recognition of unsegmented recordings has diverse applications such
as meeting transcription and automatic subtitle generation. With technical advances in …

Dover-lap: A method for combining overlap-aware diarization outputs

D Raj, LP Garcia-Perera, Z Huang… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org
Several advances have been made recently towards handling overlapping speech for
speaker diarization. Since speech and natural language tasks often benefit from ensemble …

Encoder-decoder based attractors for end-to-end neural diarization

S Horiguchi, Y Fujita, S Watanabe… - … /ACM Transactions on …, 2022 - ieeexplore.ieee.org
This paper investigates an end-to-end neural diarization (EEND) method for an unknown
number of speakers. In contrast to the conventional cascaded approach to speaker …

Multi-channel conversational speaker separation via neural diarization

H Taherian, DL Wang - IEEE/ACM Transactions on Audio …, 2024 - ieeexplore.ieee.org
When dealing with overlapped speech, the performance of automatic speech recognition
(ASR) systems substantially degrades as they are designed for single-talker speech. To …

Advances in integration of end-to-end neural and clustering-based diarization for real conversational speech

K Kinoshita, M Delcroix, N Tawara - arXiv preprint arXiv:2105.09040, 2021 - arxiv.org
Recently, we proposed a novel speaker diarization method called End-to-End-Neural-
Diarization-vector clustering (EEND-vector clustering) that integrates clustering-based and …

Ava-avd: Audio-visual speaker diarization in the wild

EZ Xu, Z Song, S Tsutsui, C Feng, M Ye… - Proceedings of the 30th …, 2022 - dl.acm.org
Audio-visual speaker diarization aims at detecting" who spoke when''using both auditory
and visual signals. Existing audio-visual diarization datasets are mainly focused on indoor …