- 学术资源搜索

文章

学术资源搜索

获得 5 条结果（用时0.03秒）

VHASR: A Multimodal Speech Recognition System With Vision Hotwords

J Hu, Z Li, P Wang, H Ai, L Zhang, H Zhao - arXiv preprint arXiv …, 2024 - arxiv.org

The image-based multimodal automatic speech recognition (ASR) model enhances speech
recognition performance by incorporating audio-related image. However, some works …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Robust Audiovisual Speech Recognition Models with Mixture-of-Experts

Y Wu, Y Peng, Y Lu, X Chang, R Song… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org

Visual signals can enhance audiovisual speech recognition accuracy by providing
additional contextual information. Given the complexity of visual signals, an audiovisual …

Automatic Speaker Verification on Myanmar Spoofing Voice Data using GMM-UBM and TDNN

WLL Phyu, WP Pa, HMS Naing - Proceedings of the 6th ACM …, 2024 - dl.acm.org

Artificial voices or human voice imitation pose a risk to speech verification security systems.
This study investigates the effectiveness of an automatic speaker verification that utilizes …

Does Image Help? A Survey On Images Based Multimodal Automatic Speech Recognition

J Hu, Z Li - Proceedings of the 6th ACM International Conference …, 2024 - dl.acm.org

Multimodal automatic speech recognition (MASR) enhances speech recognition accuracy
by incorporating other modal information. Recently, significant advancements have been …

[PDF] upv.es

[PDF][PDF] Exploring multimodal foundation models to improve interaction for people with speech impairments

I Ferri Mollá - 2023 - riunet.upv.es

[ES] Las personas con dificultades en la pronunciación, a menudo derivadas de patologías
fisiológicas o cognitivas, enfrentan desafíos significativos al utilizar tecnologías de …

高级搜索

QQ 群

VHASR: A Multimodal Speech Recognition System With Vision Hotwords

Robust Audiovisual Speech Recognition Models with Mixture-of-Experts

Automatic Speaker Verification on Myanmar Spoofing Voice Data using GMM-UBM and TDNN

Does Image Help? A Survey On Images Based Multimodal Automatic Speech Recognition

[PDF][PDF] Exploring multimodal foundation models to improve interaction for people with speech impairments

引用