A survey of speaker recognition: Fundamental theories, recognition methods and opportunities

MM Kabir, MF Mridha, J Shin, I Jahan, AQ Ohi - IEEE Access, 2021 - ieeexplore.ieee.org
Humans can identify a speaker by listening to their voice, over the telephone, or on any
digital devices. Acquiring this congenital human competency, authentication technologies …

Adaptation algorithms for neural network-based speech recognition: An overview

P Bell, J Fainberg, O Klejch, J Li… - IEEE Open Journal …, 2020 - ieeexplore.ieee.org
We present a structured overview of adaptation algorithms for neural network-based speech
recognition, considering both hybrid hidden Markov model/neural network systems and end …

[HTML][HTML] Voxceleb: Large-scale speaker verification in the wild

A Nagrani, JS Chung, W Xie, A Zisserman - Computer Speech & Language, 2020 - Elsevier
The objective of this work is speaker recognition under noisy and unconstrained conditions.
We make two key contributions. First, we introduce a very large-scale audio-visual dataset …

Voxceleb: a large-scale speaker identification dataset

A Nagrani, JS Chung, A Zisserman - arXiv preprint arXiv:1706.08612, 2017 - arxiv.org
Most existing datasets for speaker identification contain samples obtained under quite
constrained conditions, and are usually hand-annotated, hence limited in size. The goal of …

The fifth'CHiME'speech separation and recognition challenge: dataset, task and baselines

J Barker, S Watanabe, E Vincent, J Trmal - arXiv preprint arXiv …, 2018 - arxiv.org
The CHiME challenge series aims to advance robust automatic speech recognition (ASR)
technology by promoting research at the interface of speech and language processing …

An analysis of environment, microphone and data simulation mismatches in robust speech recognition

E Vincent, S Watanabe, AA Nugraha, J Barker… - Computer Speech & …, 2017 - Elsevier
Speech enhancement and automatic speech recognition (ASR) are most often evaluated in
matched (or multi-condition) settings where the acoustic conditions of the training data …

[PDF][PDF] The speakers in the wild (SITW) speaker recognition database.

M McLaren, L Ferrer, D Castan, A Lawson - Interspeech, 2016 - maelfabien.github.io
Abstract The Speakers in the Wild (SITW) speaker recognition database contains hand-
annotated speech samples from open-source media for the purpose of benchmarking text …

Speech recognition challenge in the wild: Arabic MGB-3

A Ali, S Vogel, S Renals - 2017 IEEE Automatic Speech …, 2017 - ieeexplore.ieee.org
This paper describes the Arabic MGB-3 Challenge-Arabic Speech Recognition in the Wild.
Unlike last year's Arabic MGB-2 Challenge, for which the recognition task was based on …

Capitalization and punctuation restoration: a survey

V Păiş, D Tufiş - Artificial Intelligence Review, 2022 - Springer
Ensuring proper punctuation and letter casing is a key pre-processing step towards applying
complex natural language processing algorithms. This is especially significant for textual …

Self-attention based model for punctuation prediction using word and speech embeddings

J Yi, J Tao - ICASSP 2019-2019 IEEE International Conference …, 2019 - ieeexplore.ieee.org
This paper proposes to use self-attention based model to predict punctuation marks for word
sequences. The model is trained using word and speech embedding features which are …