Cross modal audio search and retrieval with joint embeddings based on text and audio

B Elizalde, S Zarar, B Raj - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org
Existing audio search engines use one of two approaches: matching text-text or audio-audio
pairs. In the former, text queries are matched to semantically similar words in an index of …

Audio retrieval with natural language queries

AM Oncescu, A Koepke, JF Henriques, Z Akata… - arXiv preprint arXiv …, 2021 - arxiv.org
We consider the task of retrieving audio using free-form natural language queries. To study
this problem, which has received limited attention in the existing literature, we introduce …

Audio retrieval with natural language queries: A benchmark study

AS Koepke, AM Oncescu, JF Henriques… - IEEE Transactions …, 2022 - ieeexplore.ieee.org
The objectives of this work are cross-modal text-audio and audio-text retrieval, in which the
goal is to retrieve the audio content from a pool of candidates that best matches a given …

Improving content-based audio retrieval by vocal imitation feedback

B Kim, B Pardo - … 2019-2019 IEEE International Conference on …, 2019 - ieeexplore.ieee.org
Content-based audio retrieval including query-by-example (QBE) and query-by-vocal
imitation (QBV) is useful when search-relevant text labels for the audio are unavailable, or …

[PDF][PDF] Language-based audio retrieval with pre-trained models

X Mei, X Liu, H Liu, J Sun, MD Plumbley… - … and Classification of …, 2022 - dcase.community
This technical report presents a language-based audio retrieval system that we submitted to
Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge 2022 Task …

Improving audio-text retrieval via hierarchical cross-modal interaction and auxiliary captions

Y Xin, Y Zou - arXiv preprint arXiv:2307.15344, 2023 - arxiv.org
Most existing audio-text retrieval (ATR) methods focus on constructing contrastive pairs
between whole audio clips and complete caption sentences, while ignoring fine-grained …

Audio-text retrieval in context

S Lou, X Xu, M Wu, K Yu - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Audio-text retrieval based on natural language descriptions is a challenging task. It involves
learning cross-modality alignments between long sequences under inadequate data …

Retrieval-augmented text-to-audio generation

Y Yuan, H Liu, X Liu, Q Huang… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Despite recent progress in text-to-audio (TTA) generation, we show that the state-of-the-art
models, such as AudioLDM, trained on datasets with an imbalanced class distribution, such …

Improving text-audio retrieval by text-aware attention pooling and prior matrix revised loss

Y Xin, D Yang, Y Zou - ICASSP 2023-2023 IEEE International …, 2023 - ieeexplore.ieee.org
In text-audio retrieval (TAR) tasks, due to the heterogeneity of contents between text and
audio, the semantic information contained in the text is only similar to certain frames within …

Contrastive audio-language learning for music

I Manco, E Benetos, E Quinton, G Fazekas - arXiv preprint arXiv …, 2022 - arxiv.org
As one of the most intuitive interfaces known to humans, natural language has the potential
to mediate many tasks that involve human-computer interaction, especially in application …