Findings of the IWSLT 2022 Evaluation Campaign.

A Anastasopoulos, L Barrault, L Bentivogli… - Proceedings of the 19th …, 2022 - cris.fbk.eu
The evaluation campaign of the 19th International Conference on Spoken Language
Translation featured eight shared tasks:(i) Simultaneous speech translation,(ii) Offline …

LeBenchmark 2.0: A standardized, replicable and enhanced framework for self-supervised representations of French speech

T Parcollet, H Nguyen, S Evain, MZ Boito… - Computer Speech & …, 2024 - Elsevier
Self-supervised learning (SSL) is at the origin of unprecedented improvements in many
different domains including computer vision and natural language processing. Speech …

Probing sentiment-oriented pre-training inspired by human sentiment perception mechanism

T Feng, J Liu, J Yang - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Pre-training of deep convolutional neural networks (DCNNs) plays a crucial role in the field
of visual sentiment analysis (VSA). Most proposed methods employ the off-the-shelf …

Xtreme-s: Evaluating cross-lingual speech representations

A Conneau, A Bapna, Y Zhang, M Ma… - arXiv preprint arXiv …, 2022 - arxiv.org
We introduce XTREME-S, a new benchmark to evaluate universal cross-lingual speech
representations in many languages. XTREME-S covers four task families: speech …

Exploring capabilities of monolingual audio transformers using large datasets in automatic speech recognition of Czech

J Lehečka, J Švec, A Pražák, JV Psutka - arXiv preprint arXiv:2206.07627, 2022 - arxiv.org
In this paper, we present our progress in pretraining Czech monolingual audio transformers
from a large dataset containing more than 80 thousand hours of unlabeled speech, and …

A study of gender impact in self-supervised models for speech-to-text systems

MZ Boito, L Besacier, N Tomashenko… - arXiv preprint arXiv …, 2022 - arxiv.org
Self-supervised models for speech processing emerged recently as popular foundation
blocks in speech processing pipelines. These models are pre-trained on unlabeled audio …

ON-TRAC consortium systems for the IWSLT 2022 dialect and low-resource speech translation tasks

MZ Boito, J Ortega, H Riguidel, A Laurent… - arXiv preprint arXiv …, 2022 - arxiv.org
This paper describes the ON-TRAC Consortium translation systems developed for two
challenge tracks featured in the Evaluation Campaign of IWSLT 2022: low-resource and …

Speech resources in the tamasheq language

MZ Boito, F Bougares, F Barbier, S Gahbiche… - arXiv preprint arXiv …, 2022 - arxiv.org
In this paper we present two datasets for Tamasheq, a developing language mainly spoken
in Mali and Niger. These two datasets were made available for the IWSLT 2022 low …

Cross-domain voice activity detection with self-supervised representations

S Alisamir, F Ringeval, F Portet - arXiv preprint arXiv:2209.11061, 2022 - arxiv.org
Voice Activity Detection (VAD) aims at detecting speech segments on an audio signal, which
is a necessary first step for many today's speech based applications. Current state-of-the-art …

Hats: An open data set integrating human perception applied to the evaluation of automatic speech recognition metrics

T Bañeras-Roux, J Wottawa, M Rouvier… - … Conference on Text …, 2023 - Springer
Abstract Conventionally, Automatic Speech Recognition (ASR) systems are evaluated on
their ability to correctly recognize each word contained in a speech signal. In this context, the …