Modular deep learning

J Pfeiffer, S Ruder, I Vulić, EM Ponti - arXiv preprint arXiv:2302.11529, 2023 - arxiv.org
Transfer learning has recently become the dominant paradigm of machine learning. Pre-
trained models fine-tuned for downstream tasks achieve better performance with fewer …

Adaptation of Whisper models to child speech recognition

R Jain, A Barcovschi, M Yiwere, P Corcoran… - arXiv preprint arXiv …, 2023 - arxiv.org
Automatic Speech Recognition (ASR) systems often struggle with transcribing child speech
due to the lack of large child speech datasets required to accurately train child-friendly ASR …

Exploring efficient-tuning methods in self-supervised speech models

ZC Chen, CL Fu, CY Liu, SWD Li… - 2022 IEEE spoken …, 2023 - ieeexplore.ieee.org
In this study, we aim to explore efficient tuning methods for speech self-supervised learning.
Recent studies show that self-supervised learning (SSL) can learn powerful representations …

Towards better domain adaptation for self-supervised models: A case study of child ASR

R Fan, Y Zhu, J Wang, A Alwan - IEEE Journal of Selected …, 2022 - ieeexplore.ieee.org
Recently, self-supervised learning (SSL) from unlabelled speech data has gained increased
attention in the automatic speech recognition (ASR) community. Typical SSL methods …

A wav2vec2-based experimental study on self-supervised learning methods to improve child speech recognition

R Jain, A Barcovschi, MY Yiwere, D Bigioi… - IEEE …, 2023 - ieeexplore.ieee.org
Despite recent advancements in deep learning technologies, Child Speech Recognition
remains a challenging task. Current Automatic Speech Recognition (ASR) models require …

A ctc alignment-based non-autoregressive transformer for end-to-end automatic speech recognition

R Fan, W Chu, P Chang, A Alwan - IEEE/ACM Transactions on …, 2023 - ieeexplore.ieee.org
Recently, end-to-end models have been widely used in automatic speech recognition (ASR)
systems. Two of the most representative approaches are connectionist temporal …

Language fusion via adapters for low-resource speech recognition

Q Hu, Y Zhang, X Zhang, Z Han, X Liang - Speech Communication, 2024 - Elsevier
Data scarcity makes low-resource speech recognition systems suffer from severe overfitting.
Although fine-tuning addresses this issue to some extent, it leads to parameter-inefficient …

A comparative analysis between Conformer-Transducer, Whisper, and wav2vec2 for improving the child speech recognition

A Barcovschi, R Jain, P Corcoran - … International Conference on …, 2023 - ieeexplore.ieee.org
Automatic Speech Recognition (ASR) systems have progressed significantly in their
performance on adult speech data; however, transcribing child speech remains challenging …

Exploring Native and Non-Native English Child Speech Recognition With Whisper

R Jain, A Barcovschi, MY Yiwere, P Corcoran… - IEEE …, 2024 - ieeexplore.ieee.org
Modern end-to-end Automatic Speech Recognition (ASR) systems struggle to recognise
children's speech. This challenge is due to the high acoustic variability in children's voices …

Robust Automatic Speech Recognition via WavAugment Guided Phoneme Adversarial Training

G Qi, Y Chen, X Mao, X Jia, R Duan, R Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
Developing a practically-robust automatic speech recognition (ASR) is challenging since the
model should not only maintain the original performance on clean samples, but also achieve …