The audio degradation toolbox and its application to robustness evaluation

A Radford, JW Kim, T Xu, G Brockman… - International …, 2023 - proceedings.mlr.press

We study the capabilities of speech processing systems trained simply to predict large
amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual …

被引用次数：3514 相关文章所有 11 个版本

[PDF] arxiv.org

Crepe: A convolutional representation for pitch estimation

JW Kim, J Salamon, P Li… - 2018 IEEE International …, 2018 - ieeexplore.ieee.org

The task of estimating the fundamental frequency of a monophonic sound recording, also
known as pitch tracking, is fundamental to audio processing with multiple applications in …

被引用次数：480 相关文章所有 13 个版本

[PDF] hal.science

Sound event detection in domestic environments with weakly labeled data and soundscape synthesis

N Turpault, R Serizel, AP Shah… - Workshop on Detection …, 2019 - inria.hal.science

This paper presents Task 4 of the Detection and Classification of Acoustic Scenes and
Events (DCASE) 2019 challenge and provides a first analysis of the challenge results. The …

被引用次数：326 相关文章所有 13 个版本

A review on speech emotion recognition: a survey, recent advances, challenges, and the influence of noise

SM George, PM Ilyas - Neurocomputing, 2024 - Elsevier

Affective Computing systems can detect the emotional state and mindset of an individual.
Speech Emotion Recognition (SER) is a unimodal affect computing system based on …

被引用次数：7 相关文章所有 2 个版本

[PDF] qmul.ac.uk

pYIN: A fundamental frequency estimator using probabilistic threshold distributions

M Mauch, S Dixon - 2014 ieee international conference on …, 2014 - ieeexplore.ieee.org

We propose the Probabilistic YIN (PYIN) algorithm, a modification of the well-known YIN
algorithm for fundamental frequency (F0) estimation. Conventional YIN is a simple yet …

被引用次数：527 相关文章所有 12 个版本

[PDF] psu.edu

[PDF][PDF] A software framework for musical data augmentation.

B McFee, EJ Humphrey, JP Bello - ISMIR, 2015 - Citeseer

Predictive models for music annotation tasks are practically limited by a paucity of well-
annotated training data. In the broader context of large-scale machine learning, the concept …

被引用次数：198 相关文章所有 8 个版本

[PDF] hal.science

Sound event detection in synthetic domestic environments

R Serizel, N Turpault, A Shah… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

We present a comparative analysis of the performance of state-of-the-art sound event
detection systems. In particular, we study the robustness of the systems to noise and signal …

被引用次数：124 相关文章所有 7 个版本

[PDF] arxiv.org

A unified perspective on multi-domain and multi-task learning

Y Yang, TM Hospedales - arXiv preprint arXiv:1412.7489, 2014 - arxiv.org

In this paper, we provide a new neural-network based perspective on multi-task learning
(MTL) and multi-domain learning (MDL). By introducing the concept of a semantic descriptor …

被引用次数：196 相关文章所有 12 个版本

A simple method to determine if a music information retrieval system is a “horse”

BL Sturm - IEEE Transactions on Multimedia, 2014 - ieeexplore.ieee.org

We propose and demonstrate a simple method to explain the figure of merit (FoM) of a music
information retrieval (MIR) system evaluated in a dataset, specifically, whether the FoM …

被引用次数：178 相关文章所有 4 个版本

[PDF] ieee.org

PlethAugment: GAN-based PPG augmentation for medical diagnosis in low-resource settings

D Kiyasseh, GA Tadesse, L Thwaites… - IEEE journal of …, 2020 - ieeexplore.ieee.org

The paucity of physiological time-series data collected from low-resource clinical settings
limits the capabilities of modern machine learning algorithms in achieving high performance …

被引用次数：73 相关文章所有 7 个版本

高级搜索

QQ 群