Speaker recognition based on deep learning: An overview

Z Bai, XL Zhang - Neural Networks, 2021 - Elsevier
Speaker recognition is a task of identifying persons from their voices. Recently, deep
learning has dramatically revolutionized speaker recognition. However, there is lack of …

An overview of deep-learning-based audio-visual speech enhancement and separation

D Michelsanti, ZH Tan, SX Zhang, Y Xu… - … on Audio, Speech …, 2021 - ieeexplore.ieee.org
Speech enhancement and speech separation are two related tasks, whose purpose is to
extract either one or more target speech signals, respectively, from a mixture of sounds …

In defence of metric learning for speaker recognition

JS Chung, J Huh, S Mun, M Lee, HS Heo… - arXiv preprint arXiv …, 2020 - arxiv.org
The objective of this paper is' open-set'speaker recognition of unseen speakers, where ideal
embeddings should be able to condense information into a compact utterance-level …

Speaker recognition from raw waveform with sincnet

M Ravanelli, Y Bengio - 2018 IEEE spoken language …, 2018 - ieeexplore.ieee.org
Deep learning is progressively gaining popularity as a viable alternative to i-vectors for
speaker recognition. Promising results have been recently obtained with Convolutional …

Speaker identification through artificial intelligence techniques: A comprehensive review and research challenges

R Jahangir, YW Teh, HF Nweke, G Mujtaba… - Expert Systems with …, 2021 - Elsevier
Speech is a powerful medium of communication that always convey rich and useful
information, such as gender, accent, and other unique characteristics of a speaker. These …

A survey of speaker recognition: Fundamental theories, recognition methods and opportunities

MM Kabir, MF Mridha, J Shin, I Jahan, AQ Ohi - IEEE Access, 2021 - ieeexplore.ieee.org
Humans can identify a speaker by listening to their voice, over the telephone, or on any
digital devices. Acquiring this congenital human competency, authentication technologies …

[HTML][HTML] A survey of identity recognition via data fusion and feature learning

Z Qin, P Zhao, T Zhuang, F Deng, Y Ding, D Chen - Information Fusion, 2023 - Elsevier
With the rapid development of the Mobile Internet and the Industrial Internet of Things, a
variety of applications put forward an urgent demand for user and device identity …

Multi-modal multi-channel target speech separation

R Gu, SX Zhang, Y Xu, L Chen… - IEEE Journal of …, 2020 - ieeexplore.ieee.org
Target speech separation refers to extracting a target speaker's voice from an overlapped
audio of simultaneous talkers. Previously the use of visual modality for target speech …

Interpretable convolutional filters with sincnet

M Ravanelli, Y Bengio - arXiv preprint arXiv:1811.09725, 2018 - arxiv.org
Deep learning is currently playing a crucial role toward higher levels of artificial intelligence.
This paradigm allows neural networks to learn complex and abstract representations, that …

Learnable pins: Cross-modal embeddings for person identity

A Nagrani, S Albanie… - Proceedings of the …, 2018 - openaccess.thecvf.com
We propose and investigate an identity sensitive joint embedding of face and voice. Such an
embedding enables cross-modal retrieval from voice to face and from face to voice. We …