Audio–visual speech recognition based on regulated transformer and spatio–temporal fusion strategy for driver assistive systems

D Ryumin, A Axyonov, E Ryumina, D Ivanko… - Expert Systems with …, 2024 - Elsevier
This article presents a research methodology for audio–visual speech recognition (AVSR) in
driver assistive systems. These systems necessitate ongoing interaction with drivers while …

AnnoTheia: A Semi-Automatic Annotation Toolkit for Audio-Visual Speech Technologies

JM Acosta-Triana, D Gimeno-Gómez… - arXiv preprint arXiv …, 2024 - arxiv.org
More than 7,000 known languages are spoken around the world. However, due to the lack
of annotated resources, only a small fraction of them are currently covered by speech …

Leveraging Visemes for Better Visual Speech Representation and Lip Reading

J Peymanfard, V Saeedi, MR Mohammadi… - arXiv preprint arXiv …, 2023 - arxiv.org
Lip reading is a challenging task that has many potential applications in speech recognition,
human-computer interaction, and security systems. However, existing lip reading systems …