A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Speaker recognition based on deep learning: An overview

Z Bai, XL Zhang - Neural Networks, 2021 - Elsevier
Speaker recognition is a task of identifying persons from their voices. Recently, deep
learning has dramatically revolutionized speaker recognition. However, there is lack of …

[HTML][HTML] Voxceleb: Large-scale speaker verification in the wild

A Nagrani, JS Chung, W Xie, A Zisserman - Computer Speech & Language, 2020 - Elsevier
The objective of this work is speaker recognition under noisy and unconstrained conditions.
We make two key contributions. First, we introduce a very large-scale audio-visual dataset …

In defence of metric learning for speaker recognition

JS Chung, J Huh, S Mun, M Lee, HS Heo… - arXiv preprint arXiv …, 2020 - arxiv.org
The objective of this paper is' open-set'speaker recognition of unseen speakers, where ideal
embeddings should be able to condense information into a compact utterance-level …

Voxceleb2: Deep speaker recognition

JS Chung, A Nagrani, A Zisserman - arXiv preprint arXiv:1806.05622, 2018 - arxiv.org
The objective of this paper is speaker recognition under noisy and unconstrained conditions.
We make two key contributions. First, we introduce a very large-scale audio-visual speaker …

Mfa-conformer: Multi-scale feature aggregation conformer for automatic speaker verification

Y Zhang, Z Lv, H Wu, S Zhang, P Hu, Z Wu… - arXiv preprint arXiv …, 2022 - arxiv.org
In this paper, we present Multi-scale Feature Aggregation Conformer (MFA-Conformer), an
easy-to-implement, simple but effective backbone for automatic speaker verification based …

Voxceleb: a large-scale speaker identification dataset

A Nagrani, JS Chung, A Zisserman - arXiv preprint arXiv:1706.08612, 2017 - arxiv.org
Most existing datasets for speaker identification contain samples obtained under quite
constrained conditions, and are usually hand-annotated, hence limited in size. The goal of …

Utterance-level aggregation for speaker recognition in the wild

W Xie, A Nagrani, JS Chung… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
The objective of this paper is speaker recognitionin the wild'-where utterances may be of
variable length and also contain irrelevant signals. Crucial elements in the design of deep …

Disentangling voice and content with self-supervision for speaker recognition

T Liu, KA Lee, Q Wang, H Li - Advances in Neural …, 2023 - proceedings.neurips.cc
For speaker recognition, it is difficult to extract an accurate speaker representation from
speech because of its mixture of speaker traits and content. This paper proposes a …

Biometrics recognition using deep learning: A survey

S Minaee, A Abdolrashidi, H Su, M Bennamoun… - Artificial Intelligence …, 2023 - Springer
In the past few years, deep learning-based models have been very successful in achieving
state-of-the-art results in many tasks in computer vision, speech recognition, and natural …