Robust speech recognition via large-scale weak supervision

A Radford, JW Kim, T Xu, G Brockman… - International …, 2023 - proceedings.mlr.press
We study the capabilities of speech processing systems trained simply to predict large
amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual …

Crepe: A convolutional representation for pitch estimation

JW Kim, J Salamon, P Li… - 2018 IEEE International …, 2018 - ieeexplore.ieee.org
The task of estimating the fundamental frequency of a monophonic sound recording, also
known as pitch tracking, is fundamental to audio processing with multiple applications in …

Sound event detection in domestic environments with weakly labeled data and soundscape synthesis

N Turpault, R Serizel, AP Shah… - Workshop on Detection …, 2019 - inria.hal.science
This paper presents Task 4 of the Detection and Classification of Acoustic Scenes and
Events (DCASE) 2019 challenge and provides a first analysis of the challenge results. The …

A review on speech emotion recognition: a survey, recent advances, challenges, and the influence of noise

SM George, PM Ilyas - Neurocomputing, 2024 - Elsevier
Affective Computing systems can detect the emotional state and mindset of an individual.
Speech Emotion Recognition (SER) is a unimodal affect computing system based on …

pYIN: A fundamental frequency estimator using probabilistic threshold distributions

M Mauch, S Dixon - 2014 ieee international conference on …, 2014 - ieeexplore.ieee.org
We propose the Probabilistic YIN (PYIN) algorithm, a modification of the well-known YIN
algorithm for fundamental frequency (F0) estimation. Conventional YIN is a simple yet …

[PDF][PDF] A software framework for musical data augmentation.

B McFee, EJ Humphrey, JP Bello - ISMIR, 2015 - Citeseer
Predictive models for music annotation tasks are practically limited by a paucity of well-
annotated training data. In the broader context of large-scale machine learning, the concept …

Sound event detection in synthetic domestic environments

R Serizel, N Turpault, A Shah… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
We present a comparative analysis of the performance of state-of-the-art sound event
detection systems. In particular, we study the robustness of the systems to noise and signal …

A unified perspective on multi-domain and multi-task learning

Y Yang, TM Hospedales - arXiv preprint arXiv:1412.7489, 2014 - arxiv.org
In this paper, we provide a new neural-network based perspective on multi-task learning
(MTL) and multi-domain learning (MDL). By introducing the concept of a semantic descriptor …

A simple method to determine if a music information retrieval system is a “horse”

BL Sturm - IEEE Transactions on Multimedia, 2014 - ieeexplore.ieee.org
We propose and demonstrate a simple method to explain the figure of merit (FoM) of a music
information retrieval (MIR) system evaluated in a dataset, specifically, whether the FoM …

PlethAugment: GAN-based PPG augmentation for medical diagnosis in low-resource settings

D Kiyasseh, GA Tadesse, L Thwaites… - IEEE journal of …, 2020 - ieeexplore.ieee.org
The paucity of physiological time-series data collected from low-resource clinical settings
limits the capabilities of modern machine learning algorithms in achieving high performance …