textual embeddings audio retrieval- 学术资源搜索

Cross modal audio search and retrieval with joint embeddings based on text and audio

B Elizalde, S Zarar, B Raj - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org

… and retrieve audio segments or textual descriptions that … audio metadata, and retrieve the
corresponding audio [1, 2]. An alternative approach is content-based retrieval, where an audio …

被引用次数：56 相关文章所有 2 个版本

[PDF] arxiv.org

Audio retrieval with natural language queries: A benchmark study

AS Koepke, AM Oncescu, JF Henriques… - IEEE Transactions …, 2022 - ieeexplore.ieee.org

… embeddings. Specifically, given a collection of N audio samples with corresponding textual
… }, we aim to learn embedding functions, ψa and ψt, that project each audio sample ai and text …

被引用次数：75 相关文章所有 10 个版本

[PDF] arxiv.org

Learning contextual tag embeddings for cross-modal alignment of audio and tags

X Favory, K Drossos, T Virtanen… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

… In this work we propose a method for allowing the textual generalization of cross-modal …
wider range of applications, such as cross-modal retrieval or zero-shot learning. Future work will …

被引用次数：20 相关文章所有 7 个版本

[PDF] arxiv.org

Language-based audio retrieval task in DCASE 2022 challenge

H Xie, S Lipping, T Virtanen - arXiv preprint arXiv:2206.06108, 2022 - arxiv.org

… audio retrieval is introduced into as Subtask 6B, which aims to inspire further research into
audio retrieval with unconstrained textual … The final audio embedding is calculated by averag…

被引用次数：14 相关文章所有 7 个版本

[PDF] hal.science

Language-based audio retrieval with textual embeddings of tag names

T Pellegrini - … and Classification of Acoustic Scenes and …, 2022 - ut3-toulouseinp.hal.science

… Our main innovation is two-fold: i) we use logits as basic audio embeddings [3], … audio
recordings. We propose to combine the basic audio logit embeddings with the textual embeddings …

被引用次数：2 相关文章所有 16 个版本

[PDF] arxiv.org

On metric learning for audio-text cross-modal retrieval

X Mei, X Liu, J Sun, MD Plumbley, W Wang - arXiv preprint arXiv …, 2022 - arxiv.org

… [7] proposed a tag-based audio retrieval system using traditional machine learning … align
audio and textual features to a joint embedding space. Although these tag-based audio retrieval …

被引用次数：48 相关文章所有 9 个版本

[PDF] arxiv.org

Improving text-audio retrieval by text-aware attention pooling and prior matrix revised loss

Y Xin, D Yang, Y Zou - ICASSP 2023-2023 IEEE International …, 2023 - ieeexplore.ieee.org

… valid texts that match a specific audio clip. Therefore, we would expect the same audio to
be retrieved for any of these queries and a retrieval … the value-projected frame embedding: …

被引用次数：18 相关文章所有 5 个版本

[PDF] researchgate.net

Joint embeddings with multimodal cues for video-text retrieval

NC Mithun, J Li, F Metze… - … Information Retrieval, 2019 - Springer

… in the joint space by embedding visual and textual features into a … audio in learning the
embedding improves the result slightly. However, as the retrieval performance of individual audio …

被引用次数：32 相关文章所有 4 个版本

[PDF] acm.org

Learning joint embedding with multimodal cues for cross-modal video-text retrieval

NC Mithun, J Li, F Metze… - … on multimedia retrieval, 2018 - dl.acm.org

… audio features by a fusion strategy for e cient retrieval. We also present a modi ed pairwise
loss to better learn the joint embedding… joint embedding between visual input and textual input…

被引用次数：280 相关文章所有 11 个版本

[PDF] epfl.ch

Open-vocabulary keyword spotting with audio and text embeddings

N Sacchi, A Nanchen, M Jaggi… - … 2019-IEEE International …, 2019 - infoscience.epfl.ch

… embeddings of keywords for which no audio samples are available but only their textual …
To obtain phone embeddings from audio we have implemented an audio encoder similarly …

被引用次数：37 相关文章所有 9 个版本

高级搜索

QQ 群

Cross modal audio search and retrieval with joint embeddings based on text and audio

Audio retrieval with natural language queries: A benchmark study

Learning contextual tag embeddings for cross-modal alignment of audio and tags

Language-based audio retrieval task in DCASE 2022 challenge

Language-based audio retrieval with textual embeddings of tag names

On metric learning for audio-text cross-modal retrieval

Improving text-audio retrieval by text-aware attention pooling and prior matrix revised loss

Joint embeddings with multimodal cues for video-text retrieval

Learning joint embedding with multimodal cues for cross-modal video-text retrieval

Open-vocabulary keyword spotting with audio and text embeddings

相关搜索

引用