Deep spoken keyword spotting: An overview

I López-Espejo, ZH Tan, JHL Hansen, J Jensen - IEEE Access, 2021 - ieeexplore.ieee.org
Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams
and has become a fast-growing technology thanks to the paradigm shift introduced by deep …

Recent progress in the CUHK dysarthric speech recognition system

S Liu, M Geng, S Hu, X Xie, M Cui, J Yu… - … on Audio, Speech …, 2021 - ieeexplore.ieee.org
Despite the rapid progress of automatic speech recognition (ASR) technologies in the past
few decades, recognition of disordered speech remains a highly challenging task to date …

Visual speech recognition for multiple languages in the wild

P Ma, S Petridis, M Pantic - Nature Machine Intelligence, 2022 - nature.com
Visual speech recognition (VSR) aims to recognize the content of speech based on lip
movements, without relying on the audio stream. Advances in deep learning and the …

End-to-end audio-visual speech recognition with conformers

P Ma, S Petridis, M Pantic - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
In this work, we present a hybrid CTC/Attention model based on a ResNet-18 and
Convolution-augmented transformer (Conformer), that can be trained in an end-to-end …

Learning hierarchical cross-modal association for co-speech gesture generation

X Liu, Q Wu, H Zhou, Y Xu, R Qian… - Proceedings of the …, 2022 - openaccess.thecvf.com
Generating speech-consistent body and gesture movements is a long-standing problem in
virtual avatar creation. Previous studies often synthesize pose movement in a holistic …

Auto-avsr: Audio-visual speech recognition with automatic labels

P Ma, A Haliassos, A Fernandez-Lopez… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Audio-visual speech recognition has received a lot of attention due to its robustness against
acoustic noise. Recently, the performance of automatic, visual, and audio-visual speech …

Sub-word level lip reading with visual attention

KR Prajwal, T Afouras… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
The goal of this paper is to learn strong lip reading models that can recognise speech in
silent videos. Most prior works deal with the open-set visual speech recognition problem by …

Audio-driven co-speech gesture video generation

X Liu, Q Wu, H Zhou, Y Du, W Wu… - Advances in Neural …, 2022 - proceedings.neurips.cc
Co-speech gesture is crucial for human-machine interaction and digital entertainment. While
previous works mostly map speech audio to human skeletons (eg, 2D keypoints), directly …

Watch or listen: Robust audio-visual speech recognition with visual corruption modeling and reliability scoring

J Hong, M Kim, J Choi, YM Ro - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
This paper deals with Audio-Visual Speech Recognition (AVSR) under multimodal input
corruption situation where audio inputs and visual inputs are both corrupted, which is not …

Audio-visual efficient conformer for robust speech recognition

M Burchi, R Timofte - Proceedings of the IEEE/CVF Winter …, 2023 - openaccess.thecvf.com
Abstract End-to-end Automatic Speech Recognition (ASR) systems based on neural
networks have seen large improvements in recent years. The availability of large scale hand …