Audio–visual speech recognition based on regulated transformer and spatio–temporal fusion strategy for driver assistive systems

D Ryumin, A Axyonov, E Ryumina, D Ivanko… - Expert Systems with …, 2024 - Elsevier
This article presents a research methodology for audio–visual speech recognition (AVSR) in
driver assistive systems. These systems necessitate ongoing interaction with drivers while …

Cross-lingual cross-age group adaptation for low-resource elderly speech emotion recognition

S Cahyawijaya, H Lovenia, W Chung, R Frieske… - arXiv preprint arXiv …, 2023 - arxiv.org
Speech emotion recognition plays a crucial role in human-computer interactions. However,
most speech emotion recognition research is biased toward English-speaking adults, which …

IndoRobusta: towards robustness against diverse code-mixed indonesian local languages

MF Adilazuarda, S Cahyawijaya, GI Winata… - arXiv preprint arXiv …, 2023 - arxiv.org
Significant progress has been made on Indonesian NLP. Nevertheless, exploration of the
code-mixing phenomenon in Indonesian is limited, despite many languages being …

OLKAVS: an open large-scale Korean audio-visual speech dataset

J Park, JW Hwang, K Choi, SH Lee… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Inspired by humans comprehending speech in a multi-modal manner, various audio-visual
datasets have been constructed. However, most existing datasets focus on English …

KEBAP: Korean Error Explainable Benchmark Dataset for ASR and Post-processing

S Koo, C Park, J Kim, J Seo, S Eo… - Proceedings of the …, 2023 - aclanthology.org
Abstract Automatic Speech Recognition (ASR) systems are instrumental across various
applications, with their performance being critically tied to user satisfaction. Conventional …

Toward Practical Automatic Speech Recognition and Post-Processing: a Call for Explainable Error Benchmark Guideline

S Koo, C Park, J Kim, J Seo, S Eo, H Moon… - arXiv preprint arXiv …, 2024 - arxiv.org
Automatic speech recognition (ASR) outcomes serve as input for downstream tasks,
substantially impacting the satisfaction level of end-users. Hence, the diagnosis and …

Audio-Visual Speech Recognition In-The-Wild: Multi-Angle Vehicle Cabin Corpus and Attention-Based Method

A Axyonov, D Ryumin, D Ivanko… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Audio-visual speech recognition (AVSR) gains increasing attention as an important part of
human-machine interaction. However, the publicly available corpora are limited, particularly …

Automatic speech recognition datasets in cantonese: A survey and new dataset

T Yu, R Frieske, P Xu, S Cahyawijaya, CTS Yiu… - arXiv preprint arXiv …, 2022 - arxiv.org
Automatic speech recognition (ASR) on low resource languages improves the access of
linguistic minorities to technological advantages provided by artificial intelligence (AI). In this …

Toward building another arabic voice command dataset for multiple speech processing tasks

M Lichouri, K Lounnas, A Bakri - … International Conference on …, 2023 - ieeexplore.ieee.org
Expanding Internet connectivity has had tremendous influence on people's everyday life,
since they do everything on their phones and laptops [1]. Several items have been …

Cantonese sentence dataset for lip‐reading

Y Xiao, X Liu, L Teng, A Zhu, P Tian… - IET Image …, 2024 - Wiley Online Library
Lip‐reading deciphers speech by observing lip movements without relying on audio data.
The rapid advancements in deep learning have significantly improved lip‐reading for both …