A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Joint unsupervised and supervised training for multilingual asr

J Bai, B Li, Y Zhang, A Bapna… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Self-supervised training has shown promising gains in pretraining models and facilitating
the downstream finetuning for speech recognition, like multilingual ASR. Most existing …

DRAFT: A novel framework to reduce domain shifting in self-supervised learning and its application to children's ASR

R Fan, A Alwan - arXiv preprint arXiv:2206.07931, 2022 - arxiv.org
Self-supervised learning (SSL) in the pretraining stage using un-annotated speech data has
been successful in low-resource automatic speech recognition (ASR) tasks. However …

Towards better domain adaptation for self-supervised models: A case study of child ASR

R Fan, Y Zhu, J Wang, A Alwan - IEEE Journal of Selected …, 2022 - ieeexplore.ieee.org
Recently, self-supervised learning (SSL) from unlabelled speech data has gained increased
attention in the automatic speech recognition (ASR) community. Typical SSL methods …

Pseudo label is better than human label

D Hwang, KC Sim, Z Huo, T Strohman - arXiv preprint arXiv:2203.12668, 2022 - arxiv.org
State-of-the-art automatic speech recognition (ASR) systems are trained with tens of
thousands of hours of labeled speech data. Human transcription is expensive and time …

Efficient domain adaptation for speech foundation models

B Li, D Hwang, Z Huo, J Bai, G Prakash… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Foundation models (FMs), that are trained on broad data at scale and are adaptable to a
wide range of downstream tasks, have brought large interest in the research community …

Maestro-u: Leveraging joint speech-text representation learning for zero supervised speech asr

Z Chen, A Bapna, A Rosenberg… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
Training state-of-the-art Automated Speech Recognition (ASR) models typically requires a
substantial amount of transcribed speech. In this work, we demonstrate that a modality …

Tts4pretrain 2.0: Advancing the use of text and speech in asr pretraining with consistency and contrastive losses

Z Chen, Y Zhang, A Rosenberg… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
An effective way to learn representations from untranscribed speech and unspoken text with
linguistic/lexical representations derived from synthesized speech was introduced in …

Low-rank adaptation of large language model rescoring for parameter-efficient speech recognition

Y Yu, CHH Yang, J Kolehmainen… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
We propose a neural language modeling system based on low-rank adaptation (LoRA) for
speech recognition output rescoring. Although pretrained language models (LMs) like BERT …

Pada: Pruning assisted domain adaptation for self-supervised speech representations

VS Lodagala, S Ghosh, S Umesh - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
While self-supervised speech representation learning (SSL) models serve a variety of
downstream tasks, these models have been observed to overfit to the domain from which the …