Speaker diarization is a task to label audio or video recordings with classes that correspond to speaker identity, or in short, a task to identify “who spoke when”. In the early years …
The recently proposed VBx diarization method uses a Bayesian hidden Markov model to find speaker clusters in a sequence of x-vectors. In this work we perform an extensive …
End-to-end speaker diarization for an unknown number of speakers is addressed in this paper. Recently proposed end-to-end speaker diarization outperformed conventional …
Multi-speaker speech recognition of unsegmented recordings has diverse applications such as meeting transcription and automatic subtitle generation. With technical advances in …
Several advances have been made recently towards handling overlapping speech for speaker diarization. Since speech and natural language tasks often benefit from ensemble …
This paper investigates an end-to-end neural diarization (EEND) method for an unknown number of speakers. In contrast to the conventional cascaded approach to speaker …
H Taherian, DL Wang - IEEE/ACM Transactions on Audio …, 2024 - ieeexplore.ieee.org
When dealing with overlapped speech, the performance of automatic speech recognition (ASR) systems substantially degrades as they are designed for single-talker speech. To …
Recently, we proposed a novel speaker diarization method called End-to-End-Neural- Diarization-vector clustering (EEND-vector clustering) that integrates clustering-based and …
Audio-visual speaker diarization aims at detecting" who spoke when''using both auditory and visual signals. Existing audio-visual diarization datasets are mainly focused on indoor …