[PDF][PDF] Small footprint multi-channel network for keyword spotting with centroid based awareness

D Ng, Y Xiao, JQ Yip, Z Yang, B Tian, Q Fu… - Proc …, 2023 - isca-archive.org
Abstract Spoken Keyword Spotting (KWS) in noisy far-field environments is challenging for
small-footprint models, given the restrictions on computational resources (eg, model size …

MLCA-AVSR: Multi-Layer Cross Attention Fusion Based Audio-Visual Speech Recognition

H Wang, P Guo, P Zhou, L Xie - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
While automatic speech recognition (ASR) systems degrade significantly in noisy
environments, audio-visual speech recognition (AVSR) systems aim to complement the …

Robust Wake Word Spotting With Frame-Level Cross-Modal Attention Based Audio-Visual Conformer

H Wang, M Cheng, Q Fu, M Li - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
In recent years, neural network-based Wake Word Spotting achieves good performance on
clean audio samples but struggles in noisy environments. Audio-Visual Wake Word Spotting …

CAMEL: Cross-Attention Enhanced Mixture-of-Experts and Language Bias for Code-Switching Speech Recognition

H Wang, X Wan, N Zheng, K Liu, H Zhou, G Li… - arXiv preprint arXiv …, 2024 - arxiv.org
Code-switching automatic speech recognition (ASR) aims to transcribe speech that contains
two or more languages accurately. To better capture language-specific speech …

The Whu Wake Word Lipreading System for the 2024 Chat-Scenario Chinese Lipreading Challenge

H Wang, C Li, F Su, J Liu, H Suo… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
The paper describes the Wake Word Lipreading system developed by the WHU team for the
ChatCLR Challenge 2024. Although Lipreading and Wake Word Spotting have seen …

Audio-Visual Wake-up Word Spotting Under Noisy and Multi-person Scenarios

C Li, F Su, J Liu - International Conference on Pattern Recognition, 2024 - Springer
The existing audio-visual wake-up word spotting (AVWWS) methods assume that the audio
signal has been aligned with the lip movement video signal of a specific speaker in noisy …

Enhancing Visual Wake Word Spotting with Pretrained Model and Feature Balance Scaling

X Huang, S Wang, J Yan, K Tang… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
Wake word spotting mainly focus on audio modality or audio-visual multimodal exploration.
The visual modality delivers stable outcomes under poor acoustic conditions, making visual …