Contextualized end-to-end speech recognition with contextual phrase prediction network

K Huang, A Zhang, Z Yang, P Guo, B Mu, T Xu… - arXiv preprint arXiv …, 2023 - arxiv.org
K Huang, A Zhang, Z Yang, P Guo, B Mu, T Xu, L Xie
arXiv preprint arXiv:2305.12493, 2023arxiv.org
Contextual information plays a crucial role in speech recognition technologies and
incorporating it into the end-to-end speech recognition models has drawn immense interest
recently. However, previous deep bias methods lacked explicit supervision for bias tasks. In
this study, we introduce a contextual phrase prediction network for an attention-based deep
bias method. This network predicts context phrases in utterances using contextual
embeddings and calculates bias loss to assist in the training of the contextualized model …
Contextual information plays a crucial role in speech recognition technologies and incorporating it into the end-to-end speech recognition models has drawn immense interest recently. However, previous deep bias methods lacked explicit supervision for bias tasks. In this study, we introduce a contextual phrase prediction network for an attention-based deep bias method. This network predicts context phrases in utterances using contextual embeddings and calculates bias loss to assist in the training of the contextualized model. Our method achieved a significant word error rate (WER) reduction across various end-to-end speech recognition models. Experiments on the LibriSpeech corpus show that our proposed model obtains a 12.1% relative WER improvement over the baseline model, and the WER of the context phrases decreases relatively by 40.5%. Moreover, by applying a context phrase filtering strategy, we also effectively eliminate the WER degradation when using a larger biasing list.
arxiv.org
以上显示的是最相近的搜索结果。 查看全部搜索结果