Bring dialogue-context into RNN-T for streaming ASR.

K Wei, B Li, H Lv, Q Lu, N Jiang… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org

Automatic Speech Recognition (ASR) in conversational settings presents unique
challenges, including extracting relevant contextual information from previous …

被引用次数：4 相关文章所有 3 个版本

[PDF] arxiv.org

Towards effective and compact contextual representation for conformer transducer speech recognition systems

M Cui, J Kang, J Deng, X Yin, Y Xie, X Chen… - arXiv preprint arXiv …, 2023 - arxiv.org

Current ASR systems are mainly trained and evaluated at the utterance level. Long range
cross utterance context can be incorporated. A key task is to derive a suitable compact …

被引用次数：8 相关文章所有 5 个版本

[PDF] arxiv.org

Improving large-scale deep biasing with phoneme features and text-only data in streaming transducer

J Qiu, L Huang, B Li, J Zhang, L Lu… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

Deep biasing for the Transducer can improve the recognition performance of rare words or
contextual entities, which is essential in practical applications, especially for streaming …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

ViLaS: Exploring the Effects of Vision and Language Context in Automatic Speech Recognition

Z Ni, M Han, F Chen, L Meng, J Shi… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

Enhancing automatic speech recognition (ASR) performance by leveraging additional
multimodal information has shown promising results in previous studies. However, most of …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

高级搜索

QQ 群