Expanding the language coverage of speech technology has the potential to improve access to information for many more people. However, current speech technology is …
A Radford, JW Kim, T Xu, G Brockman… - International …, 2023 - proceedings.mlr.press
We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual …
This paper presents XLS-R, a large-scale model for cross-lingual speech representation learning based on wav2vec 2.0. We train models with up to 2B parameters on nearly half a …
KW Church, Z Chen, Y Ma - Natural Language Engineering, 2021 - cambridge.org
The previous Emerging Trends article (Church et al., 2021. Natural Language Engineering27 (5), 631–645.) introduced deep nets to poets. Poets is an imperfect …
Y Peng, S Dalmia, I Lane… - … Conference on Machine …, 2022 - proceedings.mlr.press
Conformer has proven to be effective in many speech processing tasks. It combines the benefits of extracting local dependencies using convolutions and global dependencies …
Recently proposed self-supervised learning approaches have been successful for pre- training speech representation models. The utility of these learned representations has been …
YY Yang, M Hira, Z Ni, A Astafurov… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
This document describes version 0.10 of TorchAudio: building blocks for machine learning applications in the audio and speech processing domain. The objective of TorchAudio is to …
Human affective behavior analysis has received much attention in human-computer interaction (HCI). In this paper, we introduce our submission to the CVPR 2022 Competition …
E Morais, R Hoory, W Zhu, I Gat… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Self-supervised pre-trained features have consistently delivered state-of-art results in the field of natural language processing (NLP); however, their merits in the field of speech …