Funaudiollm: Voice understanding and generation foundation models for natural interaction between humans and llms

K An, Q Chen, C Deng, Z Du, C Gao, Z Gao… - arXiv preprint arXiv …, 2024 - arxiv.org
This report introduces FunAudioLLM, a model family designed to enhance natural voice
interactions between humans and large language models (LLMs). At its core are two …

Neural models of text normalization for speech applications

H Zhang, R Sproat, AH Ng, F Stahlberg… - Computational …, 2019 - direct.mit.edu
Abstract Machine learning, including neural network techniques, have been applied to
virtually every domain in natural language processing. One problem that has been …

Spgispeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition

PK O'Neill, V Lavrukhin, S Majumdar, V Noroozi… - arXiv preprint arXiv …, 2021 - arxiv.org
In the English speech-to-text (STT) machine learning task, acoustic models are
conventionally trained on uncased Latin characters, and any necessary orthography (such …

Neural inverse text normalization

M Sunkara, C Shivade, S Bodapati… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
While there have been several contributions exploring state of the art techniques for text
normalization, the problem of inverse text normalization (ITN) remains relatively unexplored …

Neural text normalization with subword units

C Mansfield, M Sun, Y Liu, A Gandhe… - Proceedings of the …, 2019 - aclanthology.org
Text normalization (TN) is an important step in conversational systems. It converts written
text to its spoken form to facilitate speech recognition, natural language understanding and …

Nemo inverse text normalization: From development to production

Y Zhang, E Bakhturina, K Gorman… - arXiv preprint arXiv …, 2021 - arxiv.org
Inverse text normalization (ITN) converts spoken-domain automatic speech recognition
(ASR) output into written-domain text to improve the readability of the ASR output. Many …

Text normalization using memory augmented neural networks

S Pramanik, A Hussain - Speech Communication, 2019 - Elsevier
We perform text normalization, ie the transformation of words from the written to the spoken
form, using a memory augmented neural network. With the addition of dynamic memory …

Streaming, fast and accurate on-device inverse text normalization for automatic speech recognition

Y Gaur, N Kibre, J Xue, K Shu, Y Wang… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
Automatic Speech Recognition (ASR) systems typically yield output in lexical form. However,
humans prefer a written form output. To bridge this gap, ASR systems usually employ …

[PDF][PDF] Transcribing speech as spoken and written dual text using an autoregressive model

M Ihori, H Sato, T Tanaka, R Masumura… - Proc. INTERSPEECH …, 2023 - isca-archive.org
This paper proposes a novel method to jointly generate spoken and written text from input
speech for expanding use cases of speech-based applications. The spoken text generated …

Multi Transcription-Style Speech Transcription Using Attention-Based Encoder-Decoder Model

Y Huang, P Behre, G Ye, S Chang… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
Human professional transcription services provide a variety of transcription styles to
customize different needs. To accommodate different users and facilitate seamless …