Towards learning a universal non-semantic representation of speech

A Mohamed, H Lee, L Borgholt… - IEEE Journal of …, 2022 - ieeexplore.ieee.org

Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …

被引用次数：295 相关文章所有 10 个版本

[HTML] cell.com Full View

[HTML][HTML] Audio self-supervised learning: A survey

S Liu, A Mallol-Ragolta, E Parada-Cabaleiro, K Qian… - Patterns, 2022 - cell.com

Similar to humans' cognitive ability to generalize knowledge and skills, self-supervised
learning (SSL) targets discovering general representations from large-scale data. This …

被引用次数：98 相关文章所有 12 个版本

[PDF] arxiv.org

Musiclm: Generating music from text

A Agostinelli, TI Denk, Z Borsos, J Engel… - arXiv preprint arXiv …, 2023 - arxiv.org

We introduce MusicLM, a model generating high-fidelity music from text descriptions such
as" a calming violin melody backed by a distorted guitar riff". MusicLM casts the process of …

被引用次数：418 相关文章所有 6 个版本

[PDF] arxiv.org

Superb: Speech processing universal performance benchmark

S Yang, PH Chi, YS Chuang, CIJ Lai… - arXiv preprint arXiv …, 2021 - arxiv.org

Self-supervised learning (SSL) has proven vital for advancing research in natural language
processing (NLP) and computer vision (CV). The paradigm pretrains a shared model on …

被引用次数：769 相关文章所有 11 个版本

[PDF] arxiv.org

Noise2music: Text-conditioned music generation with diffusion models

Q Huang, DS Park, T Wang, TI Denk, A Ly… - arXiv preprint arXiv …, 2023 - arxiv.org

We introduce Noise2Music, where a series of diffusion models is trained to generate high-
quality 30-second music clips from text prompts. Two types of diffusion models, a generator …

被引用次数：123 相关文章所有 5 个版本

[PDF] neurips.cc

Pengi: An audio language model for audio tasks

S Deshmukh, B Elizalde, R Singh… - Advances in Neural …, 2023 - proceedings.neurips.cc

In the domain of audio processing, Transfer Learning has facilitated the rise of Self-
Supervised Learning and Zero-Shot Learning techniques. These approaches have led to …

被引用次数：59 相关文章所有 5 个版本

[PDF] arxiv.org

Bigssl: Exploring the frontier of large-scale semi-supervised learning for automatic speech recognition

Y Zhang, DS Park, W Han, J Qin… - IEEE Journal of …, 2022 - ieeexplore.ieee.org

We summarize the results of a host of efforts using giant automatic speech recognition (ASR)
models pre-trained using large, diverse unlabeled datasets containing approximately a …

被引用次数：165 相关文章所有 4 个版本

[PDF] arxiv.org

Contrastive learning of general-purpose audio representations

A Saeed, D Grangier… - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org

We introduce COLA, a self-supervised pre-training approach for learning a general-purpose
representation of audio. Our approach is based on contrastive learning: it learns a …

被引用次数：269 相关文章所有 7 个版本

[PDF] arxiv.org

Psla: Improving audio tagging with pretraining, sampling, labeling, and aggregation

Y Gong, YA Chung, J Glass - IEEE/ACM Transactions on Audio …, 2021 - ieeexplore.ieee.org

Audio tagging is an active research area and has a wide range of applications. Since the
release of AudioSet, great progress has been made in advancing model performance, which …

被引用次数：160 相关文章所有 6 个版本

[PDF] arxiv.org

Byol for audio: Self-supervised learning for general-purpose audio representation

D Niizumi, D Takeuchi, Y Ohishi… - … Joint Conference on …, 2021 - ieeexplore.ieee.org

Inspired by the recent progress in self-supervised learning for computer vision that
generates supervision using data augmentations, we explore a new general-purpose audio …

被引用次数：164 相关文章所有 5 个版本

高级搜索

QQ 群