Ml-superb: Multilingual speech universal performance benchmark

J Shi, D Berrebbi, W Chen, HL Chung, EP Hu… - arXiv preprint arXiv …, 2023 - arxiv.org
Speech processing Universal PERformance Benchmark (SUPERB) is a leaderboard to
benchmark the performance of Self-Supervised Learning (SSL) models on various speech …

Fine-tuning strategies for faster inference using speech self-supervised models: a comparative study

S Zaiem, R Algayres, T Parcollet… - … , Speech, and Signal …, 2023 - ieeexplore.ieee.org
Self-supervised learning (SSL) has allowed substantial progress in Automatic Speech
Recognition (ASR) performance in low-resource settings. In this context, it has been …

Training early-exit architectures for automatic speech recognition: Fine-tuning pre-trained models or training from scratch

GA Wright, U Cappellazzo, S Zaiem… - … , Speech, and Signal …, 2024 - ieeexplore.ieee.org
The ability to dynamically adjust the computational load of neural models during inference is
crucial for on-device processing scenarios characterised by limited and time-varying …

Training dynamic models using early exits for automatic speech recognition on resource-constrained devices

GA Wright, U Cappellazzo, S Zaiem, D Raj… - arXiv preprint arXiv …, 2023 - arxiv.org
The ability to dynamically adjust the computational load of neural models during inference is
crucial for on-device processing scenarios characterised by limited and time-varying …

Self-Conditioning via Intermediate Predictions for End-to-End Neural Speaker Diarization

Y Fujita, T Ogawa, T Kobayashi - IEEE Access, 2023 - ieeexplore.ieee.org
This paper presents a speaker diarization model that incorporates label dependency via
intermediate predictions. The proposed method is categorized as an end-to-end neural …

[PDF][PDF] A Study on Speaker Diarization based on End-to-end Optimization

Y FUJITA - 2023 - waseda.repo.nii.ac.jp
This dissertation addresses speaker diarization, the task of partitioning a multi-talker audio
recording into speaker-wise segments. Speaker diarization has long been studied as a …

[PDF][PDF] Block Refinement Learning for Improving Early Exit in Autoregressive ASR

N Kawata, S Orihashi, S Suzuki, T Tanaka, M Ihori… - apsipa2024.org
While the autoregressive transformer models of automatic speech recognition (ASR) are
highly accurate, the inference time is long because of their sequential decoding. Early exit is …