Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation

K Wei, B Li, H Lv, Q Lu, N Jiang… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Automatic Speech Recognition (ASR) in conversational settings presents unique
challenges, including extracting relevant contextual information from previous …

Towards effective and compact contextual representation for conformer transducer speech recognition systems

M Cui, J Kang, J Deng, X Yin, Y Xie, X Chen… - arXiv preprint arXiv …, 2023 - arxiv.org
Current ASR systems are mainly trained and evaluated at the utterance level. Long range
cross utterance context can be incorporated. A key task is to derive a suitable compact …

Improving large-scale deep biasing with phoneme features and text-only data in streaming transducer

J Qiu, L Huang, B Li, J Zhang, L Lu… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
Deep biasing for the Transducer can improve the recognition performance of rare words or
contextual entities, which is essential in practical applications, especially for streaming …

ViLaS: Exploring the Effects of Vision and Language Context in Automatic Speech Recognition

Z Ni, M Han, F Chen, L Meng, J Shi… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Enhancing automatic speech recognition (ASR) performance by leveraging additional
multimodal information has shown promising results in previous studies. However, most of …

Enhancing Dialogue Speech Recognition with Robust Contextual Awareness via Noise Representation Learning

W Lee, S Kim, GG Lee - arXiv preprint arXiv:2408.06043, 2024 - arxiv.org
Recent dialogue systems rely on turn-based spoken interactions, requiring accurate
Automatic Speech Recognition (ASR). Errors in ASR can significantly impact downstream …

Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR

M Cui, Y Yang, J Deng, J Kang, S Hu, T Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Self-supervised learning (SSL) based discrete speech representations are highly compact
and domain adaptable. In this paper, SSL discrete speech features extracted from WavLM …

Exploring SSL Discrete Tokens for Multilingual ASR

M Cui, D Tan, Y Yang, D Wang, H Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
With the advancement of Self-supervised Learning (SSL) in speech-related tasks, there has
been growing interest in utilizing discrete tokens generated by SSL for automatic speech …