Audio interval retrieval using convolutional neural networks

I Kuzminykh, D Shevchuk, S Shiaeles… - Internet of Things, Smart …, 2020 - Springer
Modern streaming services are increasingly labeling videos based on their visual or audio
content. This typically augments the use of technologies such as AI and ML by allowing to …

[PDF][PDF] Language-based audio retrieval with pre-trained models

X Mei, X Liu, H Liu, J Sun, MD Plumbley… - … 2022 Challenge, Tech …, 2022 - dcase.community
This technical report presents a language-based audio retrieval system that we submitted to
Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge 2022 Task …

Temporal convolutional networks for speech and music detection in radio broadcast

Q Lemaire, A Holzapfel - … Retrieval Conference, ISMIR 2019, 4-8 …, 2019 - diva-portal.org
The task of speech and music detection aims at the automatic annotation of potentially
overlapping speech and music segments in audio recordings. This metadata extraction …

Language-based audio retrieval with textual embeddings of tag names

T Pellegrini - … and Classification of Acoustic Scenes and …, 2022 - ut3-toulouseinp.hal.science
Language-based audio retrieval aims to retrieve audio recordings based on a queried
caption, formulated as a free-form sentence written in natural language. To perform this task …

Multi-Label Sound Event Retrieval Using A Deep Learning-Based Siamese Structure with A Pairwise Presence Matrix

J Fan, E Nichols, D Tompkins… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
Realistic recordings of soundscapes often have multiple sound events co-occurring, such as
car horns, engine and human voices. Sound event retrieval is a type of contentbased search …

Cross modal audio search and retrieval with joint embeddings based on text and audio

B Elizalde, S Zarar, B Raj - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org
Existing audio search engines use one of two approaches: matching text-text or audio-audio
pairs. In the former, text queries are matched to semantically similar words in an index of …

Language-based audio retrieval task in DCASE 2022 challenge

H Xie, S Lipping, T Virtanen - arXiv preprint arXiv:2206.06108, 2022 - arxiv.org
Language-based audio retrieval is a task, where natural language textual captions are used
as queries to retrieve audio signals from a dataset. It has been first introduced into DCASE …

[PDF][PDF] Attention-based convolutional neural network for audio event classification with feature transfer learning

T Chen, U Gupta - CVSSP, 2018 - cvssp.org
Audio event classification is an urgent Content based Information Retrieval (CBIR) unsolved
problem with numerous applications that it can benefit. This paper is explaining Pindrop's …

Audio-text retrieval in context

S Lou, X Xu, M Wu, K Yu - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Audio-text retrieval based on natural language descriptions is a challenging task. It involves
learning cross-modality alignments between long sequences under inadequate data …

Deep CNN framework for audio event recognition using weakly labeled web data

A Kumar, B Raj - arXiv preprint arXiv:1707.02530, 2017 - arxiv.org
The development of audio event recognition systems require labeled training data, which
are generally hard to obtain. One promising source of recordings of audio events is the large …