Task agnostic and task specific self-supervised learning from speech with lebenchmark

A Anastasopoulos, L Barrault, L Bentivogli… - Proceedings of the 19th …, 2022 - cris.fbk.eu

The evaluation campaign of the 19th International Conference on Spoken Language
Translation featured eight shared tasks:(i) Simultaneous speech translation,(ii) Offline …

被引用次数：109 相关文章所有 17 个版本

[PDF] arxiv.org

LeBenchmark 2.0: A standardized, replicable and enhanced framework for self-supervised representations of French speech

T Parcollet, H Nguyen, S Evain, MZ Boito… - Computer Speech & …, 2024 - Elsevier

Self-supervised learning (SSL) is at the origin of unprecedented improvements in many
different domains including computer vision and natural language processing. Speech …

被引用次数：20 相关文章所有 13 个版本

[PDF] thecvf.com

Probing sentiment-oriented pre-training inspired by human sentiment perception mechanism

T Feng, J Liu, J Yang - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com

Pre-training of deep convolutional neural networks (DCNNs) plays a crucial role in the field
of visual sentiment analysis (VSA). Most proposed methods employ the off-the-shelf …

被引用次数：9 相关文章所有 3 个版本

[PDF] arxiv.org

Xtreme-s: Evaluating cross-lingual speech representations

A Conneau, A Bapna, Y Zhang, M Ma… - arXiv preprint arXiv …, 2022 - arxiv.org

We introduce XTREME-S, a new benchmark to evaluate universal cross-lingual speech
representations in many languages. XTREME-S covers four task families: speech …

被引用次数：21 相关文章所有 8 个版本

[PDF] arxiv.org

Exploring capabilities of monolingual audio transformers using large datasets in automatic speech recognition of Czech

J Lehečka, J Švec, A Pražák, JV Psutka - arXiv preprint arXiv:2206.07627, 2022 - arxiv.org

In this paper, we present our progress in pretraining Czech monolingual audio transformers
from a large dataset containing more than 80 thousand hours of unlabeled speech, and …

被引用次数：19 相关文章所有 5 个版本

[PDF] arxiv.org

A study of gender impact in self-supervised models for speech-to-text systems

MZ Boito, L Besacier, N Tomashenko… - arXiv preprint arXiv …, 2022 - arxiv.org

Self-supervised models for speech processing emerged recently as popular foundation
blocks in speech processing pipelines. These models are pre-trained on unlabeled audio …

被引用次数：18 相关文章所有 6 个版本

[PDF] arxiv.org

ON-TRAC consortium systems for the IWSLT 2022 dialect and low-resource speech translation tasks

MZ Boito, J Ortega, H Riguidel, A Laurent… - arXiv preprint arXiv …, 2022 - arxiv.org

This paper describes the ON-TRAC Consortium translation systems developed for two
challenge tracks featured in the Evaluation Campaign of IWSLT 2022: low-resource and …

被引用次数：15 相关文章所有 7 个版本

[PDF] arxiv.org

Speech resources in the tamasheq language

MZ Boito, F Bougares, F Barbier, S Gahbiche… - arXiv preprint arXiv …, 2022 - arxiv.org

In this paper we present two datasets for Tamasheq, a developing language mainly spoken
in Mali and Niger. These two datasets were made available for the IWSLT 2022 low …

被引用次数：15 相关文章所有 4 个版本

[PDF] arxiv.org

Cross-domain voice activity detection with self-supervised representations

S Alisamir, F Ringeval, F Portet - arXiv preprint arXiv:2209.11061, 2022 - arxiv.org

Voice Activity Detection (VAD) aims at detecting speech segments on an audio signal, which
is a necessary first step for many today's speech based applications. Current state-of-the-art …

被引用次数：4 相关文章所有 2 个版本

[PDF] hal.science

Hats: An open data set integrating human perception applied to the evaluation of automatic speech recognition metrics

T Bañeras-Roux, J Wottawa, M Rouvier… - … Conference on Text …, 2023 - Springer

Abstract Conventionally, Automatic Speech Recognition (ASR) systems are evaluated on
their ability to correctly recognize each word contained in a speech signal. In this context, the …

被引用次数：3 相关文章所有 8 个版本

高级搜索

QQ 群