Ml-superb: Multilingual speech universal performance benchmark

J Shi, D Berrebbi, W Chen, HL Chung, EP Hu… - arXiv preprint arXiv …, 2023 - arxiv.org
Speech processing Universal PERformance Benchmark (SUPERB) is a leaderboard to
benchmark the performance of Self-Supervised Learning (SSL) models on various speech …

Exploring speech recognition, translation, and understanding with discrete speech units: A comparative study

X Chang, B Yan, K Choi, JW Jung, Y Lu… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Speech signals, typically sampled at rates in the tens of thousands per second, contain
redundancies, evoking inefficiencies in sequence modeling. High-dimensional speech …

[HTML][HTML] What do self-supervised speech models know about words?

A Pasad, CM Chien, S Settle, K Livescu - Transactions of the …, 2024 - direct.mit.edu
Many self-supervised speech models (S3Ms) have been introduced over the last few years,
improving performance and data efficiency on various speech tasks. However, these …

Toward joint language modeling for speech units and text

JC Chou, CM Chien, WN Hsu, K Livescu… - arXiv preprint arXiv …, 2023 - arxiv.org
Speech and text are two major forms of human language. The research community has
been focusing on mapping speech to text or vice versa for many years. However, in the field …

Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning

S Wang, Z Chen, KA Lee, Y Qian, H Li - arXiv preprint arXiv:2407.15188, 2024 - arxiv.org
Speaker individuality information is among the most critical elements within speech signals.
By thoroughly and accurately modeling this information, it can be utilized in various …

What do self-supervised speech models know about words?

A Pasad, CM Chien, S Settle, K Livescu - arXiv preprint arXiv:2307.00162, 2023 - arxiv.org
Many self-supervised speech models (S3Ms) have been introduced over the last few years,
producing performance and data efficiency improvements for a variety of speech tasks …

Akvsr: Audio knowledge empowered visual speech recognition by compressing audio knowledge of a pretrained model

JH Yeo, M Kim, J Choi, DH Kim… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Visual Speech Recognition (VSR) is the task of predicting spoken words from silent lip
movements. VSR is regarded as a challenging task because of the insufficient information …

Speech self-supervised representations benchmarking: a case for larger probing heads

S Zaiem, Y Kemiche, T Parcollet, S Essid… - Computer Speech & …, 2025 - Elsevier
Self-supervised learning (SSL) leverages large datasets of unlabeled speech to reach
impressive performance with reduced amounts of annotated data. The high number of …

LeBenchmark 2.0: A standardized, replicable and enhanced framework for self-supervised representations of French speech

T Parcollet, H Nguyen, S Evain, MZ Boito… - Computer Speech & …, 2024 - Elsevier
Self-supervised learning (SSL) is at the origin of unprecedented improvements in many
different domains including computer vision and natural language processing. Speech …

Melhubert: A simplified hubert on mel spectrograms

TQ Lin, H Lee, H Tang - 2023 IEEE Automatic Speech …, 2023 - ieeexplore.ieee.org
Self-supervised models have had great success in learning speech representations that can
generalize to various downstream tasks. However, most self-supervised models require a …