Hear: Holistic evaluation of audio representations

A Mohamed, H Lee, L Borgholt… - IEEE Journal of …, 2022 - ieeexplore.ieee.org

Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …

被引用次数：280 相关文章所有 10 个版本

[PDF] cell.com Full View

Audio self-supervised learning: A survey

S Liu, A Mallol-Ragolta, E Parada-Cabaleiro, K Qian… - Patterns, 2022 - cell.com

Similar to humans' cognitive ability to generalize knowledge and skills, self-supervised
learning (SSL) targets discovering general representations from large-scale data. This …

被引用次数：93 相关文章所有 12 个版本

[PDF] arxiv.org

Clap learning audio concepts from natural language supervision

B Elizalde, S Deshmukh, M Al Ismail… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Mainstream machine listening models are trained to learn audio concepts under the
paradigm of one class label to many recordings focusing on one task. Learning under such …

被引用次数：233 相关文章所有 3 个版本

[PDF] neurips.cc

Pengi: An audio language model for audio tasks

S Deshmukh, B Elizalde, R Singh… - Advances in Neural …, 2023 - proceedings.neurips.cc

In the domain of audio processing, Transfer Learning has facilitated the rise of Self-
Supervised Learning and Zero-Shot Learning techniques. These approaches have led to …

被引用次数：53 相关文章所有 5 个版本

[PDF] mlr.press

Masked spectrogram modeling using masked autoencoders for learning general-purpose audio representation

D Niizumi, D Takeuchi, Y Ohishi… - … Evaluation of Audio …, 2022 - proceedings.mlr.press

Recent general-purpose audio representations show state-of-the-art performance on
various audio tasks. These representations are pre-trained by self-supervised learning …

被引用次数：52 相关文章所有 5 个版本

[PDF] arxiv.org

Whisper-at: Noise-robust automatic speech recognizers are also strong general audio event taggers

Y Gong, S Khurana, L Karlinsky, J Glass - arXiv preprint arXiv:2307.03183, 2023 - arxiv.org

In this paper, we focus on Whisper, a recent automatic speech recognition model trained
with a massive 680k hour labeled speech corpus recorded in diverse conditions. We first …

被引用次数：45 相关文章所有 8 个版本

[PDF] nature.com

Global birdsong embeddings enable superior transfer learning for bioacoustic classification

B Ghani, T Denton, S Kahl, H Klinck - Scientific Reports, 2023 - nature.com

Automated bioacoustic analysis aids understanding and protection of both marine and
terrestrial animals and their habitats across extensive spatiotemporal scales, and typically …

被引用次数：17 相关文章所有 8 个版本

[HTML] sciencedirect.com

[HTML][HTML] Learning to detect an animal sound from five examples

I Nolasco, S Singh, V Morfi, V Lostanlen… - Ecological …, 2023 - Elsevier

Automatic detection and classification of animal sounds has many applications in
biodiversity monitoring and animal behavior. In the past twenty years, the volume of digitised …

被引用次数：27 相关文章所有 18 个版本

[PDF] ieee.org

BYOL for audio: Exploring pre-trained general-purpose audio representations

D Niizumi, D Takeuchi, Y Ohishi… - … on Audio, Speech …, 2022 - ieeexplore.ieee.org

Pre-trained models are essential as feature extractors in modern machine learning systems
in various domains. In this study, we hypothesize that representations effective for general …

被引用次数：40 相关文章所有 5 个版本

[PDF] neurips.cc

Marble: Music audio representation benchmark for universal evaluation

R Yuan, Y Ma, Y Li, G Zhang, X Chen… - Advances in …, 2023 - proceedings.neurips.cc

In the era of extensive intersection between art and Artificial Intelligence (AI), such as image
generation and fiction co-creation, AI for music remains relatively nascent, particularly in …

被引用次数：9 相关文章所有 7 个版本

高级搜索

QQ 群