Learning general audio representations with large-scale training of patchout audio transformers

A Quelennec, M Olvera, G Peeters… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

Current state-of-the-art audio analysis systems rely on pre-trained embedding models, often
used off-the-shelf as (frozen) feature extractors. Choosing the best one for a set of tasks is …

被引用次数：2 相关文章所有 18 个版本

[PDF] arxiv.org

Advancing natural-language based audio retrieval with passt and large audio-caption data sets

P Primus, K Koutini, G Widmer - arXiv preprint arXiv:2308.04258, 2023 - arxiv.org

This work presents a text-to-audio-retrieval system based on pre-trained text and
spectrogram transformers. Our method projects recordings and textual descriptions into a …

被引用次数：14 相关文章所有 3 个版本

[PDF] arxiv.org

Low-complexity audio embedding extractors

F Schmid, K Koutini, G Widmer - 2023 31st European Signal …, 2023 - ieeexplore.ieee.org

Solving tasks such as speaker recognition, music classification, or semantic audio event
tagging with deep learning models typically requires computationally demanding networks …

被引用次数：6 相关文章所有 5 个版本

[PDF] dcase.community

[PDF][PDF] Cp-jku's submission to task 6b of the dcase2023 challenge: Audio retrieval with passt and gpt-augmented captions

P Primus, K Koutini, G Widmer - 2023 - dcase.community

This technical report describes CP-JKU's submission to the naturallanguage-based audio
retrieval task of the 2023 DCASE Challenge (Task 6b). Our proposed system uses …

被引用次数：9 相关文章

[PDF] arxiv.org

Embedding Compression for Teacher-to-Student Knowledge Transfer

Y Ding, A Lerch - arXiv preprint arXiv:2402.06761, 2024 - arxiv.org

Common knowledge distillation methods require the teacher model and the student model
to be trained on the same task. However, the usage of embeddings as teachers has also …

被引用次数：1 相关文章所有 3 个版本

[PDF] jku.at

Inductive Bias in Learning General Audio Representations/submitted by Khaled Koutini

K Koutini - 2022 - epub.jku.at

Abstract Machine auditory perception is a critical component in the development of artificial
intelligence systems capable of comprehending their surroundings. Perceiving and …

高级搜索

QQ 群