Cross modal audio search and retrieval with joint embeddings based on text and audio

B Elizalde, S Zarar, B Raj - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org
… and retrieve audio segments or textual descriptions that … audio metadata, and retrieve the
corresponding audio [1, 2]. An alternative approach is content-based retrieval, where an audio

Audio retrieval with natural language queries: A benchmark study

AS Koepke, AM Oncescu, JF Henriques… - IEEE Transactions …, 2022 - ieeexplore.ieee.org
embeddings. Specifically, given a collection of N audio samples with corresponding textual
… }, we aim to learn embedding functions, ψa and ψt, that project each audio sample ai and text …

Learning contextual tag embeddings for cross-modal alignment of audio and tags

X Favory, K Drossos, T Virtanen… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
… In this work we propose a method for allowing the textual generalization of cross-modal …
wider range of applications, such as cross-modal retrieval or zero-shot learning. Future work will …

Language-based audio retrieval task in DCASE 2022 challenge

H Xie, S Lipping, T Virtanen - arXiv preprint arXiv:2206.06108, 2022 - arxiv.org
audio retrieval is introduced into as Subtask 6B, which aims to inspire further research into
audio retrieval with unconstrained textual … The final audio embedding is calculated by averag…

Language-based audio retrieval with textual embeddings of tag names

T Pellegrini - … and Classification of Acoustic Scenes and …, 2022 - ut3-toulouseinp.hal.science
… Our main innovation is two-fold: i) we use logits as basic audio embeddings [3], … audio
recordings. We propose to combine the basic audio logit embeddings with the textual embeddings

On metric learning for audio-text cross-modal retrieval

X Mei, X Liu, J Sun, MD Plumbley, W Wang - arXiv preprint arXiv …, 2022 - arxiv.org
… [7] proposed a tag-based audio retrieval system using traditional machine learning … align
audio and textual features to a joint embedding space. Although these tag-based audio retrieval

Improving text-audio retrieval by text-aware attention pooling and prior matrix revised loss

Y Xin, D Yang, Y Zou - ICASSP 2023-2023 IEEE International …, 2023 - ieeexplore.ieee.org
… valid texts that match a specific audio clip. Therefore, we would expect the same audio to
be retrieved for any of these queries and a retrieval … the value-projected frame embedding: …

Joint embeddings with multimodal cues for video-text retrieval

NC Mithun, J Li, F Metze… - … Information Retrieval, 2019 - Springer
… in the joint space by embedding visual and textual features into a … audio in learning the
embedding improves the result slightly. However, as the retrieval performance of individual audio

Learning joint embedding with multimodal cues for cross-modal video-text retrieval

NC Mithun, J Li, F Metze… - … on multimedia retrieval, 2018 - dl.acm.org
audio features by a fusion strategy for e cient retrieval. We also present a modi ed pairwise
loss to better learn the joint embedding… joint embedding between visual input and textual input…

Open-vocabulary keyword spotting with audio and text embeddings

N Sacchi, A Nanchen, M Jaggi… - … 2019-IEEE International …, 2019 - infoscience.epfl.ch
embeddings of keywords for which no audio samples are available but only their textual
To obtain phone embeddings from audio we have implemented an audio encoder similarly …