Speaker recognition based on deep learning: An overview

Z Bai, XL Zhang - Neural Networks, 2021 - Elsevier
Speaker recognition is a task of identifying persons from their voices. Recently, deep
learning has dramatically revolutionized speaker recognition. However, there is lack of …

Deep spoken keyword spotting: An overview

I López-Espejo, ZH Tan, JHL Hansen, J Jensen - IEEE Access, 2021 - ieeexplore.ieee.org
Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams
and has become a fast-growing technology thanks to the paradigm shift introduced by deep …

Convmixer: Feature interactive convolution with curriculum learning for small footprint and noisy far-field keyword spotting

D Ng, Y Chen, B Tian, Q Fu… - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Building efficient architecture in neural speech processing is paramount to success in
keyword spotting deployment. However, it is very challenging for lightweight models to …

Improving multi-scale aggregation using feature pyramid module for robust speaker verification of variable-duration utterances

Y Jung, SM Kye, Y Choi, M Jung, H Kim - arXiv preprint arXiv:2004.03194, 2020 - arxiv.org
Currently, the most widely used approach for speaker verification is the deep speaker
embedding learning. In this approach, we obtain a speaker embedding vector by pooling …

A unified deep learning framework for short-duration speaker verification in adverse environments

Y Jung, Y Choi, H Lim, H Kim - IEEE Access, 2020 - ieeexplore.ieee.org
Speaker verification (SV) has recently attracted considerable research interest due to the
growing popularity of virtual assistants. At the same time, there is an increasing requirement …

Towards on-device domain adaptation for noise-robust keyword spotting

C Cioflan, L Cavigelli, M Rusci… - 2022 IEEE 4th …, 2022 - ieeexplore.ieee.org
The accuracy of a keyword spotting model deployed on embedded devices often degrades
when the system is exposed to real environments with significant noise. In this paper, we …

[HTML][HTML] A multi-task network for speaker and command recognition in industrial environments

S Bini, G Percannella, A Saggese, M Vento - Pattern Recognition Letters, 2023 - Elsevier
In industrial environments, it is crucial to establish a strong collaboration between humans
and robots to enhance productivity. However, the nature of the work demands that workers …

Audio-visual wake word spotting in misp2021 challenge: Dataset release and deep analysis

H Zhou, J Du, G Zou, Z Nian, CH Lee… - Proceedings of the …, 2022 - research.tudelft.nl
In this paper, we describe and release publicly the audio-visual wake word spotting (WWS)
database in the MISP2021 Challenge, which covers a range of scenarios of audio and video …

Personalized keyword spotting through multi-task learning

S Yang, B Kim, I Chung, S Chang - arXiv preprint arXiv:2206.13708, 2022 - arxiv.org
Keyword spotting (KWS) plays an essential role in enabling speech-based user interaction
on smart devices, and conventional KWS (C-KWS) approaches have concentrated on …

A multi-tasking model of speaker-keyword classification for keeping human in the loop of drone-assisted inspection

Y Li, A Parsan, B Wang, P Dong, S Yao… - Engineering Applications of …, 2023 - Elsevier
Audio commands are a preferred communication medium to keep inspectors in the loop of
civil infrastructure inspection performed by a semi-autonomous drone. To understand job …