Audio-Text retrieval takes a natural language query to retrieve relevant audio files in a database. Conversely, Text-Audio retrieval takes an audio file as a query to retrieve relevant …
The objectives of this work are cross-modal text-audio and audio-text retrieval, in which the goal is to retrieve the audio content from a pool of candidates that best matches a given …
This technical report presents a language-based audio retrieval system that we submitted to Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge 2022 Task …
This technical report provides a concise overview of our systems submitted to the DCASE Challenge 2023 for tasks 6a,” Automated Audio Captioning”(AAC), and 6b,” Language …
We consider the task of retrieving audio using free-form natural language queries. To study this problem, which has received limited attention in the existing literature, we introduce …
JH Cho, YA Park, J Kim, JH Chang - Proc. Detection and …, 2023 - dcase.community
This paper presents the automated audio captioning model for participating in the detection and classification of acoustic scenes and events 2023 challenge task 6A. The model …
W Yuan, Q Han, D Liu, X Li, Z Yang - Tech. Rep., DCASE …, 2021 - dcase.community
This technical report describes the system participating to the Detection and Classification of Acoustic Scenes and Events (DCASE) 2021 Challenge, Task 6: automated audio …
Q Han, W Yuan, D Liu, X Li, Z Yang - DCASE, 2021 - dcase.community
Audio captioning is a multi-modal task, focusing on generating a natural sentence to describe the content in an audio clip. This paper proposes a solution of automated audio …
X Xu, H Dinkel, M Wu, K Yu - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
Automated Audio Captioning is a cross-modal task, generating natural language descriptions to summarize the audio clips' sound events. However, grounding the actual …