VHASR: A Multimodal Speech Recognition System With Vision Hotwords

J Hu, Z Li, P Wang, H Ai, L Zhang, H Zhao - arXiv preprint arXiv …, 2024 - arxiv.org
The image-based multimodal automatic speech recognition (ASR) model enhances speech
recognition performance by incorporating audio-related image. However, some works …

Robust Audiovisual Speech Recognition Models with Mixture-of-Experts

Y Wu, Y Peng, Y Lu, X Chang, R Song… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org
Visual signals can enhance audiovisual speech recognition accuracy by providing
additional contextual information. Given the complexity of visual signals, an audiovisual …

Automatic Speaker Verification on Myanmar Spoofing Voice Data using GMM-UBM and TDNN

WLL Phyu, WP Pa, HMS Naing - Proceedings of the 6th ACM …, 2024 - dl.acm.org
Artificial voices or human voice imitation pose a risk to speech verification security systems.
This study investigates the effectiveness of an automatic speaker verification that utilizes …

Does Image Help? A Survey On Images Based Multimodal Automatic Speech Recognition

J Hu, Z Li - Proceedings of the 6th ACM International Conference …, 2024 - dl.acm.org
Multimodal automatic speech recognition (MASR) enhances speech recognition accuracy
by incorporating other modal information. Recently, significant advancements have been …

[PDF][PDF] Exploring multimodal foundation models to improve interaction for people with speech impairments

I Ferri Mollá - 2023 - riunet.upv.es
[ES] Las personas con dificultades en la pronunciación, a menudo derivadas de patologías
fisiológicas o cognitivas, enfrentan desafíos significativos al utilizar tecnologías de …