Robust speech activity detection in movie audio: Data resources and experimental evaluation

R Hebbar, K Somandepalli… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
Speech activity detection in highly variable acoustic conditions is a challenging task. Many
approaches to detect speech activity in such conditions involve an inherent knowledge of …

[HTML][HTML] Exploring convolutional, recurrent, and hybrid deep neural networks for speech and music detection in a large audio dataset

D de Benito-Gorron, A Lozano-Diez… - EURASIP Journal on …, 2019 - Springer
Audio signals represent a wide diversity of acoustic events, from background environmental
noise to spoken communication. Machine learning models such as neural networks have …

Siamese style convolutional neural networks for sound search by vocal imitation

Y Zhang, B Pardo, Z Duan - IEEE/ACM Transactions on Audio …, 2018 - ieeexplore.ieee.org
Conventional methods for finding audio in databases typically search text labels, rather than
the audio itself. This can be problematic as labels may be missing, irrelevant to the audio …

DNN and CNN with weighted and multi-task loss functions for audio event detection

H Phan, M Krawczyk-Becker, T Gerkmann… - arXiv preprint arXiv …, 2017 - arxiv.org
This report presents our audio event detection system submitted for Task 2," Detection of
rare sound events", of DCASE 2017 challenge. The proposed system is based on …

Audio retrieval with wavtext5k and clap training

S Deshmukh, B Elizalde, H Wang - arXiv preprint arXiv:2209.14275, 2022 - arxiv.org
Audio-Text retrieval takes a natural language query to retrieve relevant audio files in a
database. Conversely, Text-Audio retrieval takes an audio file as a query to retrieve relevant …

Language transfer of audio word2vec: Learning audio segment representations without target language data

CH Shen, JY Sung, HY Lee - 2018 IEEE International …, 2018 - ieeexplore.ieee.org
Audio Word2Vec offers vector representations of fixed dimensionality for variable-length
audio segments using Sequence to-sequence Autoencoder (SA). These vector …

Retrieval-augmented text-to-audio generation

Y Yuan, H Liu, X Liu, Q Huang… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Despite recent progress in text-to-audio (TTA) generation, we show that the state-of-the-art
models, such as AudioLDM, trained on datasets with an imbalanced class distribution, such …

Swishnet: A fast convolutional neural network for speech, music and noise classification and segmentation

MS Hussain, MA Haque - arXiv preprint arXiv:1812.00149, 2018 - arxiv.org
Speech, Music and Noise classification/segmentation is an important preprocessing step for
audio processing/indexing. To this end, we propose a novel 1D Convolutional Neural …

EEG2Mel: Reconstructing sound from brain responses to music

AG Ramirez-Aristizabal, C Kello - arXiv preprint arXiv:2207.13845, 2022 - arxiv.org
Information retrieval from brain responses to auditory and visual stimuli has shown success
through classification of song names and image classes presented to participants while …

Deep Learning for MIR Tutorial

A Schindler, T Lidy, S Böck - arXiv preprint arXiv:2001.05266, 2020 - arxiv.org
Deep Learning has become state of the art in visual computing and continuously emerges
into the Music Information Retrieval (MIR) and audio retrieval domain. In order to bring …