相关文章- 学术资源搜索

Cross modal audio search and retrieval with joint embeddings based on text and audio

B Elizalde, S Zarar, B Raj - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org

Existing audio search engines use one of two approaches: matching text-text or audio-audio
pairs. In the former, text queries are matched to semantically similar words in an index of …

被引用次数：56 相关文章所有 2 个版本

[PDF] arxiv.org

Audio retrieval with natural language queries

AM Oncescu, A Koepke, JF Henriques, Z Akata… - arXiv preprint arXiv …, 2021 - arxiv.org

We consider the task of retrieving audio using free-form natural language queries. To study
this problem, which has received limited attention in the existing literature, we introduce …

被引用次数：75 相关文章所有 13 个版本

[PDF] arxiv.org

Audio retrieval with natural language queries: A benchmark study

AS Koepke, AM Oncescu, JF Henriques… - IEEE Transactions …, 2022 - ieeexplore.ieee.org

The objectives of this work are cross-modal text-audio and audio-text retrieval, in which the
goal is to retrieve the audio content from a pool of candidates that best matches a given …

被引用次数：80 相关文章所有 10 个版本

[PDF] bongjunkim.com

Improving content-based audio retrieval by vocal imitation feedback

B Kim, B Pardo - … 2019-2019 IEEE International Conference on …, 2019 - ieeexplore.ieee.org

Content-based audio retrieval including query-by-example (QBE) and query-by-vocal
imitation (QBV) is useful when search-relevant text labels for the audio are unavailable, or …

被引用次数：21 相关文章所有 6 个版本

[PDF] dcase.community

[PDF][PDF] Language-based audio retrieval with pre-trained models

X Mei, X Liu, H Liu, J Sun, MD Plumbley… - … and Classification of …, 2022 - dcase.community

This technical report presents a language-based audio retrieval system that we submitted to
Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge 2022 Task …

被引用次数：20 相关文章所有 2 个版本

[PDF] arxiv.org

Improving audio-text retrieval via hierarchical cross-modal interaction and auxiliary captions

Y Xin, Y Zou - arXiv preprint arXiv:2307.15344, 2023 - arxiv.org

Most existing audio-text retrieval (ATR) methods focus on constructing contrastive pairs
between whole audio clips and complete caption sentences, while ignoring fine-grained …

被引用次数：4 相关文章所有 4 个版本

[PDF] arxiv.org

Audio-text retrieval in context

S Lou, X Xu, M Wu, K Yu - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org

Audio-text retrieval based on natural language descriptions is a challenging task. It involves
learning cross-modality alignments between long sequences under inadequate data …

被引用次数：25 相关文章所有 5 个版本

[PDF] arxiv.org

Retrieval-augmented text-to-audio generation

Y Yuan, H Liu, X Liu, Q Huang… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

Despite recent progress in text-to-audio (TTA) generation, we show that the state-of-the-art
models, such as AudioLDM, trained on datasets with an imbalanced class distribution, such …

被引用次数：5 相关文章所有 5 个版本

[PDF] arxiv.org

Improving text-audio retrieval by text-aware attention pooling and prior matrix revised loss

Y Xin, D Yang, Y Zou - ICASSP 2023-2023 IEEE International …, 2023 - ieeexplore.ieee.org

In text-audio retrieval (TAR) tasks, due to the heterogeneity of contents between text and
audio, the semantic information contained in the text is only similar to certain frames within …

被引用次数：19 相关文章所有 5 个版本

[PDF] arxiv.org

Contrastive audio-language learning for music

I Manco, E Benetos, E Quinton, G Fazekas - arXiv preprint arXiv …, 2022 - arxiv.org

As one of the most intuitive interfaces known to humans, natural language has the potential
to mediate many tasks that involve human-computer interaction, especially in application …

被引用次数：33 相关文章所有 8 个版本

高级搜索

QQ 群

Cross modal audio search and retrieval with joint embeddings based on text and audio

Audio retrieval with natural language queries

Audio retrieval with natural language queries: A benchmark study

Improving content-based audio retrieval by vocal imitation feedback

[PDF][PDF] Language-based audio retrieval with pre-trained models

Improving audio-text retrieval via hierarchical cross-modal interaction and auxiliary captions

Audio-text retrieval in context

Retrieval-augmented text-to-audio generation

Improving text-audio retrieval by text-aware attention pooling and prior matrix revised loss

Contrastive audio-language learning for music

相关搜索

引用