SlideSpeech: A Large Scale Slide-Enriched Audio-Visual Corpus

H Wang, F Yu, X Shi, Y Wang… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Multi-Modal automatic speech recognition (ASR) techniques aim to leverage additional
modalities to improve the performance of speech recognition systems. While existing …

Improving large-scale deep biasing with phoneme features and text-only data in streaming transducer

J Qiu, L Huang, B Li, J Zhang, L Lu… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
Deep biasing for the Transducer can improve the recognition performance of rare words or
contextual entities, which is essential in practical applications, especially for streaming …

SeACo-Paraformer: A non-autoregressive ASR system with flexible and effective hotword customization ability

X Shi, Y Yang, Z Li, Y Chen, Z Gao… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Hotword customization is one of the concerned issues remained in ASR field-it is of value to
enable users of ASR systems to customize names of entities, persons and other phrases to …

Contextualized end-to-end automatic speech recognition with intermediate biasing loss

M Shakeel, Y Sudo, Y Peng, S Watanabe - arXiv preprint arXiv …, 2024 - arxiv.org
Contextualized end-to-end automatic speech recognition has been an active research area,
with recent efforts focusing on the implicit learning of contextual phrases based on the final …

PromptASR for contextualized ASR with controllable style

X Yang, W Kang, Z Yao, Y Yang, L Guo… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Prompts are crucial to large language models as they provide context information such as
topic or logical relationships. Inspired by this, we propose PromptASR, a framework that …

Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech Recognition

K Huang, A Zhang, B Zhang, T Xu… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
The attention-based deep contextual biasing method has been demonstrated to effectively
improve the recognition performance of end-to-end automatic speech recognition (ASR) …

CTC-Assisted LLM-Based Contextual ASR

G Yang, Z Ma, Z Gao, S Zhang… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org
Contextual ASR or hotword customization holds substantial practical value. Despite the
impressive performance of current end-to-end (E2E) automatic speech recognition (ASR) …

Contextualized Automatic Speech Recognition With Attention-Based Bias Phrase Boosted Beam Search

Y Sudo, M Shakeel, Y Fukumoto… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
End-to-end (E2E) automatic speech recognition (ASR) methods exhibit remarkable
performance. However, since the performance of such methods is intrinsically linked to the …

LCB-Net: Long-Context Biasing for Audio-Visual Speech Recognition

F Yu, H Wang, X Shi, S Zhang - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
The growing prevalence of online conferences and courses presents a new challenge in
improving automatic speech recognition (ASR) with enriched textual information from video …

Improving ASR Contextual Biasing with Guided Attention

J Tang, K Kim, S Shon, F Wu… - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
In this paper, we propose a Guided Attention (GA) auxiliary training loss, which improves the
effectiveness and robustness of automatic speech recognition (ASR) contextual biasing …