Text-independent speaker verification based on triplet convolutional neural network embeddings

Z Bai, XL Zhang - Neural Networks, 2021 - Elsevier

Speaker recognition is a task of identifying persons from their voices. Recently, deep
learning has dramatically revolutionized speaker recognition. However, there is lack of …

被引用次数：439 相关文章所有 9 个版本

[PDF] arxiv.org

An overview of deep-learning-based audio-visual speech enhancement and separation

D Michelsanti, ZH Tan, SX Zhang, Y Xu… - … on Audio, Speech …, 2021 - ieeexplore.ieee.org

Speech enhancement and speech separation are two related tasks, whose purpose is to
extract either one or more target speech signals, respectively, from a mixture of sounds …

被引用次数：303 相关文章所有 6 个版本

[PDF] arxiv.org

In defence of metric learning for speaker recognition

JS Chung, J Huh, S Mun, M Lee, HS Heo… - arXiv preprint arXiv …, 2020 - arxiv.org

The objective of this paper is' open-set'speaker recognition of unseen speakers, where ideal
embeddings should be able to condense information into a compact utterance-level …

被引用次数：533 相关文章所有 11 个版本

[PDF] researchgate.net

Speaker recognition from raw waveform with sincnet

M Ravanelli, Y Bengio - 2018 IEEE spoken language …, 2018 - ieeexplore.ieee.org

Deep learning is progressively gaining popularity as a viable alternative to i-vectors for
speaker recognition. Promising results have been recently obtained with Convolutional …

被引用次数：1012 相关文章所有 10 个版本

Speaker identification through artificial intelligence techniques: A comprehensive review and research challenges

R Jahangir, YW Teh, HF Nweke, G Mujtaba… - Expert Systems with …, 2021 - Elsevier

Speech is a powerful medium of communication that always convey rich and useful
information, such as gender, accent, and other unique characteristics of a speaker. These …

被引用次数：125 相关文章所有 4 个版本

[PDF] ieee.org

A survey of speaker recognition: Fundamental theories, recognition methods and opportunities

MM Kabir, MF Mridha, J Shin, I Jahan, AQ Ohi - IEEE Access, 2021 - ieeexplore.ieee.org

Humans can identify a speaker by listening to their voice, over the telephone, or on any
digital devices. Acquiring this congenital human competency, authentication technologies …

被引用次数：126 相关文章所有 4 个版本

[HTML] sciencedirect.com

[HTML][HTML] A survey of identity recognition via data fusion and feature learning

Z Qin, P Zhao, T Zhuang, F Deng, Y Ding, D Chen - Information Fusion, 2023 - Elsevier

With the rapid development of the Mobile Internet and the Industrial Internet of Things, a
variety of applications put forward an urgent demand for user and device identity …

被引用次数：44 相关文章所有 2 个版本

[PDF] arxiv.org

Multi-modal multi-channel target speech separation

R Gu, SX Zhang, Y Xu, L Chen… - IEEE Journal of …, 2020 - ieeexplore.ieee.org

Target speech separation refers to extracting a target speaker's voice from an overlapped
audio of simultaneous talkers. Previously the use of visual modality for target speech …

被引用次数：116 相关文章所有 5 个版本

[PDF] researchgate.net

Interpretable convolutional filters with sincnet

M Ravanelli, Y Bengio - arXiv preprint arXiv:1811.09725, 2018 - arxiv.org

Deep learning is currently playing a crucial role toward higher levels of artificial intelligence.
This paradigm allows neural networks to learn complex and abstract representations, that …

被引用次数：151 相关文章所有 5 个版本

[PDF] thecvf.com

Learnable pins: Cross-modal embeddings for person identity

A Nagrani, S Albanie… - Proceedings of the …, 2018 - openaccess.thecvf.com

We propose and investigate an identity sensitive joint embedding of face and voice. Such an
embedding enables cross-modal retrieval from voice to face and from face to voice. We …

被引用次数：171 相关文章所有 12 个版本

高级搜索

QQ 群