Improving automatic speech recognition performance for low-resource languages with self-supervised models

J Zhao, WQ Zhang - IEEE Journal of Selected Topics in Signal …, 2022 - ieeexplore.ieee.org
Speech self-supervised learning has attracted much attention due to its promising
performance in multiple downstream tasks, and has become a new growth engine for …

Parp: Prune, adjust and re-prune for self-supervised speech recognition

CIJ Lai, Y Zhang, AH Liu, S Chang… - Advances in …, 2021 - proceedings.neurips.cc
Self-supervised speech representation learning (speech SSL) has demonstrated the benefit
of scale in learning rich representations for Automatic Speech Recognition (ASR) with …

Parameter efficient transfer learning for various speech processing tasks

S Otake, R Kawakami, N Inoue - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Fine-tuning of self-supervised models is a powerful transfer learning method in a variety of
fields, including speech processing, since it can utilize generic feature representations …

Losses can be blessings: Routing self-supervised speech representations towards efficient multilingual and multitask speech processing

Y Fu, Y Zhang, K Qian, Z Ye, Z Yu… - Advances in Neural …, 2022 - proceedings.neurips.cc
Self-supervised learning (SSL) for rich speech representations has achieved empirical
success in low-resource Automatic Speech Recognition (ASR) and other speech processing …

Wav2vec-S: Semi-supervised pre-training for low-resource asr

H Zhu, L Wang, J Wang, G Cheng, P Zhang… - arXiv preprint arXiv …, 2021 - arxiv.org
Self-supervised pre-training could effectively improve the performance of low-resource
automatic speech recognition (ASR). However, existing self-supervised pre-training are task …

TokenVerse: Towards Unifying Speech and NLP Tasks via Transducer-based ASR

S Kumar, S Madikeri, JPZ Gomez… - Proceedings of the …, 2024 - aclanthology.org
In traditional conversational intelligence from speech, a cascaded pipeline is used, involving
tasks such as voice activity detection, diarization, transcription, and subsequent processing …

TokenVerse: Unifying Speech and NLP Tasks via Transducer-based ASR

S Kumar, S Madikeri, J Zuluaga-Gomez… - arXiv preprint arXiv …, 2024 - arxiv.org
In traditional conversational intelligence from speech, a cascaded pipeline is used, involving
tasks such as voice activity detection, diarization, transcription, and subsequent processing …

Mertech: Instrument Playing Technique Detection Using Self-Supervised Pretrained Model with Multi-Task Finetuning

D Li, Y Ma, W Wei, Q Kong, Y Wu, M Che… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Instrument playing techniques (IPTs) constitute a pivotal component of musical expression.
However, the development of automatic IPT detection methods suffers from limited labeled …

Gated Multi Encoders and Multitask Objectives for Dialectal Speech Recognition in Indian Languages

S Udupa, J Bandekar, G Deekshitha… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
In this work, several methods have been proposed towards improving the performance of
dialectal automatic speech recognition (ASR). A novel encoder architecture has been …

Self-Supervised Learning based Monaural Speech Enhancement with Complex-Cycle-Consistent

Y Li, Y Sun, SM Naqvi - arXiv preprint arXiv:2112.11142, 2021 - arxiv.org
Recently, self-supervised learning (SSL) techniques have been introduced to solve the
monaural speech enhancement problem. Due to the lack of using clean phase information …