[PDF][PDF] Language-based audio retrieval with pre-trained models

X Mei, X Liu, H Liu, J Sun, MD Plumbley… - … and Classification of …, 2022 - dcase.community
This technical report presents a language-based audio retrieval system that we submitted to
Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge 2022 Task …

Audio retrieval with natural language queries: A benchmark study

AS Koepke, AM Oncescu, JF Henriques… - IEEE Transactions …, 2022 - ieeexplore.ieee.org
The objectives of this work are cross-modal text-audio and audio-text retrieval, in which the
goal is to retrieve the audio content from a pool of candidates that best matches a given …

[PDF][PDF] The SJTU system for DCASE2022 challenge task 6: Audio captioning with audio-text retrieval pre-training

X Xu, Z Xie, M Wu, K Yu - Tech. Rep., DCASE2022 Challenge, 2022 - dcase.community
This technical report describes the system submitted to the Detection and Classification of
Acoustic Scenes and Events (DCASE) 2022 challenge Task 6. There are two involving …

Language-based audio retrieval task in DCASE 2022 challenge

H Xie, S Lipping, T Virtanen - arXiv preprint arXiv:2206.06108, 2022 - arxiv.org
Language-based audio retrieval is a task, where natural language textual captions are used
as queries to retrieve audio signals from a dataset. It has been first introduced into DCASE …

Audio-text retrieval in context

S Lou, X Xu, M Wu, K Yu - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Audio-text retrieval based on natural language descriptions is a challenging task. It involves
learning cross-modality alignments between long sequences under inadequate data …

[PDF][PDF] IRIT-UPS DCASE 2023 audio captioning and retrieval system

E Labbé, T Pellegrini, J Pinquier - Proc. Conf. Detection …, 2023 - dcase.community
This technical report provides a concise overview of our systems submitted to the DCASE
Challenge 2023 for tasks 6a,” Automated Audio Captioning”(AAC), and 6b,” Language …

Audio retrieval with wavtext5k and clap training

S Deshmukh, B Elizalde, H Wang - arXiv preprint arXiv:2209.14275, 2022 - arxiv.org
Audio-Text retrieval takes a natural language query to retrieve relevant audio files in a
database. Conversely, Text-Audio retrieval takes an audio file as a query to retrieve relevant …

Improving audio-text retrieval via hierarchical cross-modal interaction and auxiliary captions

Y Xin, Y Zou - arXiv preprint arXiv:2307.15344, 2023 - arxiv.org
Most existing audio-text retrieval (ATR) methods focus on constructing contrastive pairs
between whole audio clips and complete caption sentences, while ignoring fine-grained …

[PDF][PDF] Leveraging multi-task training and image retrieval with clap for audio captioning

H Sun, Z Yan, Y Wang, H Dinkel… - Proc. Conf. Detection …, 2023 - dcase.community
This technical report serves as our submission to Task 6 of the Detection and Classification
of Acoustic Scenes and Events (DCASE) 2023 challenge. Our system, as described in this …

Cross modal audio search and retrieval with joint embeddings based on text and audio

B Elizalde, S Zarar, B Raj - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org
Existing audio search engines use one of two approaches: matching text-text or audio-audio
pairs. In the former, text queries are matched to semantically similar words in an index of …