The objectives of this work are cross-modal text-audio and audio-text retrieval, in which the goal is to retrieve the audio content from a pool of candidates that best matches a given …
X Xu, Z Xie, M Wu, K Yu - Tech. Rep., DCASE2022 Challenge, 2022 - dcase.community
This technical report describes the system submitted to the Detection and Classification of Acoustic Scenes and Events (DCASE) 2022 challenge Task 6. There are two involving …
H Xie, S Lipping, T Virtanen - arXiv preprint arXiv:2206.06108, 2022 - arxiv.org
Language-based audio retrieval is a task, where natural language textual captions are used as queries to retrieve audio signals from a dataset. It has been first introduced into DCASE …
S Lou, X Xu, M Wu, K Yu - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Audio-text retrieval based on natural language descriptions is a challenging task. It involves learning cross-modality alignments between long sequences under inadequate data …
This technical report provides a concise overview of our systems submitted to the DCASE Challenge 2023 for tasks 6a,” Automated Audio Captioning”(AAC), and 6b,” Language …
Audio-Text retrieval takes a natural language query to retrieve relevant audio files in a database. Conversely, Text-Audio retrieval takes an audio file as a query to retrieve relevant …
Y Xin, Y Zou - arXiv preprint arXiv:2307.15344, 2023 - arxiv.org
Most existing audio-text retrieval (ATR) methods focus on constructing contrastive pairs between whole audio clips and complete caption sentences, while ignoring fine-grained …
H Sun, Z Yan, Y Wang, H Dinkel… - Proc. Conf. Detection …, 2023 - dcase.community
This technical report serves as our submission to Task 6 of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2023 challenge. Our system, as described in this …
Existing audio search engines use one of two approaches: matching text-text or audio-audio pairs. In the former, text queries are matched to semantically similar words in an index of …