- 学术资源搜索

Audio–visual speech recognition based on regulated transformer and spatio–temporal fusion strategy for driver assistive systems

D Ryumin, A Axyonov, E Ryumina, D Ivanko… - Expert Systems with …, 2024 - Elsevier

This article presents a research methodology for audio–visual speech recognition (AVSR) in
driver assistive systems. These systems necessitate ongoing interaction with drivers while …

被引用次数：4 相关文章

[PDF] arxiv.org

Cross-lingual cross-age group adaptation for low-resource elderly speech emotion recognition

S Cahyawijaya, H Lovenia, W Chung, R Frieske… - arXiv preprint arXiv …, 2023 - arxiv.org

Speech emotion recognition plays a crucial role in human-computer interactions. However,
most speech emotion recognition research is biased toward English-speaking adults, which …

被引用次数：3 相关文章所有 4 个版本

[PDF] arxiv.org

IndoRobusta: towards robustness against diverse code-mixed indonesian local languages

MF Adilazuarda, S Cahyawijaya, GI Winata… - arXiv preprint arXiv …, 2023 - arxiv.org

Significant progress has been made on Indonesian NLP. Nevertheless, exploration of the
code-mixing phenomenon in Indonesian is limited, despite many languages being …

被引用次数：6 相关文章所有 6 个版本

[PDF] arxiv.org

OLKAVS: an open large-scale Korean audio-visual speech dataset

J Park, JW Hwang, K Choi, SH Lee… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

Inspired by humans comprehending speech in a multi-modal manner, various audio-visual
datasets have been constructed. However, most existing datasets focus on English …

被引用次数：5 相关文章所有 3 个版本

[PDF] aclanthology.org

KEBAP: Korean Error Explainable Benchmark Dataset for ASR and Post-processing

S Koo, C Park, J Kim, J Seo, S Eo… - Proceedings of the …, 2023 - aclanthology.org

Abstract Automatic Speech Recognition (ASR) systems are instrumental across various
applications, with their performance being critically tied to user satisfaction. Conventional …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Toward Practical Automatic Speech Recognition and Post-Processing: a Call for Explainable Error Benchmark Guideline

S Koo, C Park, J Kim, J Seo, S Eo, H Moon… - arXiv preprint arXiv …, 2024 - arxiv.org

Automatic speech recognition (ASR) outcomes serve as input for downstream tasks,
substantially impacting the satisfaction level of end-users. Hence, the diagnosis and …

被引用次数：1 相关文章所有 2 个版本

Audio-Visual Speech Recognition In-The-Wild: Multi-Angle Vehicle Cabin Corpus and Attention-Based Method

A Axyonov, D Ryumin, D Ivanko… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

Audio-visual speech recognition (AVSR) gains increasing attention as an important part of
human-machine interaction. However, the publicly available corpora are limited, particularly …

被引用次数：1 相关文章

[PDF] arxiv.org

Automatic speech recognition datasets in cantonese: A survey and new dataset

T Yu, R Frieske, P Xu, S Cahyawijaya, CTS Yiu… - arXiv preprint arXiv …, 2022 - arxiv.org

Automatic speech recognition (ASR) on low resource languages improves the access of
linguistic minorities to technological advantages provided by artificial intelligence (AI). In this …

被引用次数：6 相关文章所有 9 个版本

Toward building another arabic voice command dataset for multiple speech processing tasks

M Lichouri, K Lounnas, A Bakri - … International Conference on …, 2023 - ieeexplore.ieee.org

Expanding Internet connectivity has had tremendous influence on people's everyday life,
since they do everything on their phones and laptops [1]. Several items have been …

被引用次数：3 相关文章

[PDF] wiley.com Full View

Cantonese sentence dataset for lip‐reading

Y Xiao, X Liu, L Teng, A Zhu, P Tian… - IET Image …, 2024 - Wiley Online Library

Lip‐reading deciphers speech by observing lip movements without relying on audio data.
The rapid advancements in deep learning have significantly improved lip‐reading for both …

高级搜索

QQ 群

Audio–visual speech recognition based on regulated transformer and spatio–temporal fusion strategy for driver assistive systems

Cross-lingual cross-age group adaptation for low-resource elderly speech emotion recognition

IndoRobusta: towards robustness against diverse code-mixed indonesian local languages

OLKAVS: an open large-scale Korean audio-visual speech dataset

KEBAP: Korean Error Explainable Benchmark Dataset for ASR and Post-processing

Toward Practical Automatic Speech Recognition and Post-Processing: a Call for Explainable Error Benchmark Guideline

Audio-Visual Speech Recognition In-The-Wild: Multi-Angle Vehicle Cabin Corpus and Attention-Based Method

Automatic speech recognition datasets in cantonese: A survey and new dataset

Toward building another arabic voice command dataset for multiple speech processing tasks

Cantonese sentence dataset for lip‐reading

引用