We consider the task of retrieving audio using free-form natural language queries. To study this problem, which has received limited attention in the existing literature, we introduce …
The objectives of this work are cross-modal text-audio and audio-text retrieval, in which the goal is to retrieve the audio content from a pool of candidates that best matches a given …
B Kim, B Pardo - … 2019-2019 IEEE International Conference on …, 2019 - ieeexplore.ieee.org
Content-based audio retrieval including query-by-example (QBE) and query-by-vocal imitation (QBV) is useful when search-relevant text labels for the audio are unavailable, or …
This technical report presents a language-based audio retrieval system that we submitted to Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge 2022 Task …
Y Xin, Y Zou - arXiv preprint arXiv:2307.15344, 2023 - arxiv.org
Most existing audio-text retrieval (ATR) methods focus on constructing contrastive pairs between whole audio clips and complete caption sentences, while ignoring fine-grained …
S Lou, X Xu, M Wu, K Yu - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Audio-text retrieval based on natural language descriptions is a challenging task. It involves learning cross-modality alignments between long sequences under inadequate data …
Despite recent progress in text-to-audio (TTA) generation, we show that the state-of-the-art models, such as AudioLDM, trained on datasets with an imbalanced class distribution, such …
Y Xin, D Yang, Y Zou - ICASSP 2023-2023 IEEE International …, 2023 - ieeexplore.ieee.org
In text-audio retrieval (TAR) tasks, due to the heterogeneity of contents between text and audio, the semantic information contained in the text is only similar to certain frames within …
As one of the most intuitive interfaces known to humans, natural language has the potential to mediate many tasks that involve human-computer interaction, especially in application …