A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Speaker recognition based on deep learning: An overview

Z Bai, XL Zhang - Neural Networks, 2021 - Elsevier
Speaker recognition is a task of identifying persons from their voices. Recently, deep
learning has dramatically revolutionized speaker recognition. However, there is lack of …

A review of speaker diarization: Recent advances with deep learning

TJ Park, N Kanda, D Dimitriadis, KJ Han… - Computer Speech & …, 2022 - Elsevier
Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …

Lipopolysaccharide-induced model of neuroinflammation: mechanisms of action, research application and future directions for its use

A Skrzypczak-Wiercioch, K Sałat - Molecules, 2022 - mdpi.com
Despite advances in antimicrobial and anti-inflammatory therapies, inflammation and its
consequences still remain a significant problem in medicine. Acute inflammatory responses …

Multimodal intelligence: Representation learning, information fusion, and applications

C Zhang, Z Yang, X He, L Deng - IEEE Journal of Selected …, 2020 - ieeexplore.ieee.org
Deep learning methods haverevolutionized speech recognition, image recognition, and
natural language processing since 2010. Each of these tasks involves a single modality in …

Voicefilter: Targeted voice separation by speaker-conditioned spectrogram masking

Q Wang, H Muckenhirn, K Wilson, P Sridhar… - arXiv preprint arXiv …, 2018 - arxiv.org
In this paper, we present a novel system that separates the voice of a target speaker from
multi-speaker signals, by making use of a reference signal from the target speaker. We …

Speaker recognition for multi-speaker conversations using x-vectors

D Snyder, D Garcia-Romero, G Sell… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
Recently, deep neural networks that map utterances to fixed-dimensional embeddings have
emerged as the state-of-the-art in speaker recognition. Our prior work introduced x-vectors …

End-to-end neural speaker diarization with self-attention

Y Fujita, N Kanda, S Horiguchi, Y Xue… - 2019 IEEE Automatic …, 2019 - ieeexplore.ieee.org
Speaker diarization has been mainly developed based on the clustering of speaker
embeddings. However, the clustering-based approach has two major problems; ie,(i) it is not …

End-to-end neural speaker diarization with permutation-free objectives

Y Fujita, N Kanda, S Horiguchi, K Nagamatsu… - arXiv preprint arXiv …, 2019 - arxiv.org
In this paper, we propose a novel end-to-end neural-network-based speaker diarization
method. Unlike most existing methods, our proposed method does not have separate …

The third DIHARD diarization challenge

N Ryant, P Singh, V Krishnamohan, R Varma… - arXiv preprint arXiv …, 2020 - arxiv.org
DIHARD III was the third in a series of speaker diarization challenges intended to improve
the robustness of diarization systems to variability in recording equipment, noise conditions …