VE-KWS: Visual modality enhanced end-to-end keyword spotting

D Ng, Y Xiao, JQ Yip, Z Yang, B Tian, Q Fu… - Proc …, 2023 - isca-archive.org

Abstract Spoken Keyword Spotting (KWS) in noisy far-field environments is challenging for
small-footprint models, given the restrictions on computational resources (eg, model size …

被引用次数：9 相关文章所有 2 个版本

[PDF] arxiv.org

MLCA-AVSR: Multi-Layer Cross Attention Fusion Based Audio-Visual Speech Recognition

H Wang, P Guo, P Zhou, L Xie - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org

While automatic speech recognition (ASR) systems degrade significantly in noisy
environments, audio-visual speech recognition (AVSR) systems aim to complement the …

被引用次数：14 相关文章所有 3 个版本

[PDF] arxiv.org

Robust Wake Word Spotting With Frame-Level Cross-Modal Attention Based Audio-Visual Conformer

H Wang, M Cheng, Q Fu, M Li - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org

In recent years, neural network-based Wake Word Spotting achieves good performance on
clean audio samples but struggles in noisy environments. Audio-Visual Wake Word Spotting …

被引用次数：3 相关文章所有 4 个版本

[PDF] arxiv.org

CAMEL: Cross-Attention Enhanced Mixture-of-Experts and Language Bias for Code-Switching Speech Recognition

H Wang, X Wan, N Zheng, K Liu, H Zhou, G Li… - arXiv preprint arXiv …, 2024 - arxiv.org

Code-switching automatic speech recognition (ASR) aims to transcribe speech that contains
two or more languages accurately. To better capture language-specific speech …

[PDF] duke.edu

高级搜索

QQ 群