Superb: Speech processing universal performance benchmark

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

被引用次数：116 相关文章所有 6 个版本

[PDF] dtu.dk

Self-supervised speech representation learning: A review

A Mohamed, H Lee, L Borgholt… - IEEE Journal of …, 2022 - ieeexplore.ieee.org

Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …

被引用次数：300 相关文章所有 10 个版本

[PDF] arxiv.org

Wavlm: Large-scale self-supervised pre-training for full stack speech processing

S Chen, C Wang, Z Chen, Y Wu, S Liu… - IEEE Journal of …, 2022 - ieeexplore.ieee.org

Self-supervised learning (SSL) achieves great success in speech recognition, while limited
exploration has been attempted for other speech processing tasks. As speech signal …

被引用次数：1286 相关文章所有 5 个版本

[PDF] neurips.cc

Masked autoencoders that listen

PY Huang, H Xu, J Li, A Baevski… - Advances in …, 2022 - proceedings.neurips.cc

This paper studies a simple extension of image-based Masked Autoencoders (MAE) to self-
supervised representation learning from audio spectrograms. Following the Transformer …

被引用次数：181 相关文章所有 5 个版本

[PDF] arxiv.org

Dawn of the transformer era in speech emotion recognition: closing the valence gap

J Wagner, A Triantafyllopoulos… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org

Recent advances in transformer-based architectures have shown promise in several
machine learning tasks. In the audio domain, such architectures have been successfully …

被引用次数：236 相关文章所有 8 个版本

[PDF] arxiv.org

SpeechBrain: A general-purpose speech toolkit

M Ravanelli, T Parcollet, P Plantinga, A Rouhe… - arXiv preprint arXiv …, 2021 - arxiv.org

SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the
research and development of neural speech processing technologies by being simple …

被引用次数：600 相关文章所有 5 个版本

[PDF] arxiv.org

Beats: Audio pre-training with acoustic tokenizers

S Chen, Y Wu, C Wang, S Liu, D Tompkins… - arXiv preprint arXiv …, 2022 - arxiv.org

The massive growth of self-supervised learning (SSL) has been witnessed in language,
vision, speech, and audio domains over the past few years. While discrete label prediction is …

被引用次数：162 相关文章所有 8 个版本

[PDF] aaai.org

Ssast: Self-supervised audio spectrogram transformer

Y Gong, CI Lai, YA Chung, J Glass - … of the AAAI Conference on Artificial …, 2022 - ojs.aaai.org

Recently, neural networks based purely on self-attention, such as the Vision Transformer
(ViT), have been shown to outperform deep learning models constructed with convolutional …

被引用次数：252 相关文章所有 11 个版本

[PDF] neurips.cc

Usb: A unified semi-supervised learning benchmark for classification

Y Wang, H Chen, Y Fan, W Sun… - Advances in …, 2022 - proceedings.neurips.cc

Semi-supervised learning (SSL) improves model generalization by leveraging massive
unlabeled data to augment limited labeled samples. However, currently, popular SSL …

被引用次数：106 相关文章所有 9 个版本

[PDF] arxiv.org

Speecht5: Unified-modal encoder-decoder pre-training for spoken language processing

J Ao, R Wang, L Zhou, C Wang, S Ren, Y Wu… - arXiv preprint arXiv …, 2021 - arxiv.org

Motivated by the success of T5 (Text-To-Text Transfer Transformer) in pre-trained natural
language processing models, we propose a unified-modal SpeechT5 framework that …

被引用次数：193 相关文章所有 6 个版本

高级搜索

QQ 群