A multi-purpose audio-visual corpus for multi-modal Persian speech recognition: The Arman-AV dataset

文章

学术资源搜索

获得 3 条结果（用时0.03秒）

A multi-purpose audio-visual corpus for multi-modal Persian speech recognition: The Arman-AV dataset

Audio–visual speech recognition based on regulated transformer and spatio–temporal fusion strategy for driver assistive systems

D Ryumin, A Axyonov, E Ryumina, D Ivanko… - Expert Systems with …, 2024 - Elsevier

This article presents a research methodology for audio–visual speech recognition (AVSR) in
driver assistive systems. These systems necessitate ongoing interaction with drivers while …

被引用次数：1 相关文章

[PDF] arxiv.org

AnnoTheia: A Semi-Automatic Annotation Toolkit for Audio-Visual Speech Technologies

JM Acosta-Triana, D Gimeno-Gómez… - arXiv preprint arXiv …, 2024 - arxiv.org

More than 7,000 known languages are spoken around the world. However, due to the lack
of annotated resources, only a small fraction of them are currently covered by speech …

Leveraging Visemes for Better Visual Speech Representation and Lip Reading

J Peymanfard, V Saeedi, MR Mohammadi… - arXiv preprint arXiv …, 2023 - arxiv.org

Lip reading is a challenging task that has many potential applications in speech recognition,
human-computer interaction, and security systems. However, existing lip reading systems …

高级搜索

QQ 群

A multi-purpose audio-visual corpus for multi-modal Persian speech recognition: The Arman-AV dataset

Audio–visual speech recognition based on regulated transformer and spatio–temporal fusion strategy for driver assistive systems

AnnoTheia: A Semi-Automatic Annotation Toolkit for Audio-Visual Speech Technologies

Leveraging Visemes for Better Visual Speech Representation and Lip Reading

引用