Bytes are all you need: End-to-end multilingual speech recognition and synthesis with bytes

B Li, Y Zhang, T Sainath, Y Wu… - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org
We present two end-to-end models: Audio-to-Byte (A2B) and Byte-to-Audio (B2A), for
multilingual speech recognition and synthesis. Prior work has predominantly used …

From senones to chenones: Tied context-dependent graphemes for hybrid speech recognition

D Le, X Zhang, W Zheng, C Fügen… - 2019 IEEE Automatic …, 2019 - ieeexplore.ieee.org
There is an implicit assumption that traditional hybrid approaches for automatic speech
recognition (ASR) cannot directly model graphemes and need to rely on phonetic lexicons to …

Improving RNN transducer based ASR with auxiliary tasks

C Liu, F Zhang, D Le, S Kim, Y Saraf… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org
End-to-end automatic speech recognition (ASR) models with a single neural network have
recently demonstrated state-of-the-art results compared to conventional hybrid speech …

Sequence student-teacher training of deep neural networks

JHM Wong, MJF Gales - 2016 - repository.cam.ac.uk
The performance of automatic speech recognition can often be significantly improved by
combining multiple systems together. Though beneficial, ensemble methods can be …

Towards language-universal end-to-end speech recognition

S Kim, ML Seltzer - 2018 IEEE International Conference on …, 2018 - ieeexplore.ieee.org
Building speech recognizers in multiple languages typically involves replicating a
monolingual training recipe for each language, or utilizing a multi-task learning approach …

Cross-lingual multi-speaker speech synthesis with limited bilingual training data

Z Cai, Y Yang, M Li - Computer Speech & Language, 2023 - Elsevier
Modeling voices for multiple speakers and multiple languages with one speech synthesis
system has been a challenge for a long time, especially in low-resource cases. This paper …

Learn and Don't Forget: Adding a New Language to ASR Foundation Models

M Qian, S Tang, R Ma, KM Knill, MJF Gales - arXiv preprint arXiv …, 2024 - arxiv.org
Foundation ASR models often support many languages, eg 100 languages in Whisper.
However, there has been limited work on integrating an additional, typically low-resource …

Improving interpretability and regularization in deep learning

C Wu, MJF Gales, A Ragni… - … /ACM Transactions on …, 2017 - ieeexplore.ieee.org
Deep learning approaches yield state-of-the-art performance in a range of tasks, including
automatic speech recognition. However, the highly distributed representation in a deep …

[PDF][PDF] The Kaldi OpenKWS System: Improving Low Resource Keyword Search.

J Trmal, M Wiesner, V Peddinti, X Zhang… - Interspeech, 2017 - researchgate.net
The IARPA BABEL program has stimulated worldwide research in keyword search
technology for low resource languages, and the NIST OpenKWS evaluations are the de …

Multilingual graphemic hybrid ASR with massive data augmentation

C Liu, Q Zhang, X Zhang, K Singh, Y Saraf… - arXiv preprint arXiv …, 2019 - arxiv.org
Towards developing high-performing ASR for low-resource languages, approaches to
address the lack of resources are to make use of data from multiple languages, and to …