[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

Towards contextual spelling correction for customization of end-to-end speech recognition systems

X Wang, Y Liu, J Li, V Miljanic, S Zhao… - … /ACM Transactions on …, 2022 - ieeexplore.ieee.org
Contextual biasing is an important and challenging task for end-to-end automatic speech
recognition (ASR) systems, which aims to achieve better recognition performance by biasing …

Foundationtts: Text-to-speech for asr customization with generative language model

R Xue, Y Liu, L He, X Tan, L Liu, E Lin… - arXiv preprint arXiv …, 2023 - arxiv.org
Neural text-to-speech (TTS) generally consists of cascaded architecture with separately
optimized acoustic model and vocoder, or end-to-end architecture with continuous mel …

Improving contextual spelling correction by external acoustics attention and semantic aware data augmentation

X Wang, Y Liu, J Li, S Zhao - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
We previously proposed contextual spelling correction (CSC) to correct the output of end-to-
end (E2E) automatic speech recognition (ASR) models with contextual information such as …

Contextual Spelling Correction with Large Language Models

G Song, Z Wu, G Pundak, A Chandorkar… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
Contextual Spelling Correction (CSC) models are used to improve automatic speech
recognition (ASR) quality given userspecific context. Typically, context is modeled as a large …

Spellmapper: A non-autoregressive neural spellchecker for asr customization with candidate retrieval based on n-gram mappings

A Antonova, E Bakhturina, B Ginsburg - arXiv preprint arXiv:2306.02317, 2023 - arxiv.org
Contextual spelling correction models are an alternative to shallow fusion to improve
automatic speech recognition (ASR) quality given user vocabulary. To deal with large user …

Deferred NAM: Low-latency Top-K Context Injection via DeferredContext Encoding for Non-Streaming ASR

Z Wu, G Song, C Li, P Rondon, Z Meng, X Velez… - arXiv preprint arXiv …, 2024 - arxiv.org
Contextual biasing enables speech recognizers to transcribe important phrases in the
speaker's context, such as contact names, even if they are rare in, or absent from, the …

Beyond Hard Samples: Robust and Effective Grammatical Error Correction with Cycle Self-Augmenting

K Feng, Z Tang, J Li, M Zhang - CCF International Conference on Natural …, 2023 - Springer
Recent studies have revealed that grammatical error correction methods in the sequence-to-
sequence paradigm are vulnerable to adversarial attacks. Large Language Models (LLMs) …

基于中文语义-音韵信息的语音识别文本校对模型

仲美玉, 吴培良, 窦燕, 刘毅, 孔令富 - 通信学报, 2022 - infocomm-journal.com
为了研究拼音对检测和纠正语音识别文本错误的影响, 提出了一种基于中文语义−
音韵信息的文本校对模型. 定义了5 种拼音编码方法构建字符− 音韵嵌入向量 …

Have best of both worlds: Two-pass hybrid and E2E cascading framework for speech recognition

G Ye, V Mazalov, J Li, Y Gong - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Hybrid and end-to-end (E2E) systems have their individual advantages, with different error
patterns in the speech recognition results. By jointly modeling audio and text, the E2E model …