On the choice of the optimal temporal support for audio classification with Pre-trained embeddings

A Quelennec, M Olvera, G Peeters… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Current state-of-the-art audio analysis systems rely on pre-trained embedding models, often
used off-the-shelf as (frozen) feature extractors. Choosing the best one for a set of tasks is …

Advancing natural-language based audio retrieval with passt and large audio-caption data sets

P Primus, K Koutini, G Widmer - arXiv preprint arXiv:2308.04258, 2023 - arxiv.org
This work presents a text-to-audio-retrieval system based on pre-trained text and
spectrogram transformers. Our method projects recordings and textual descriptions into a …

Low-complexity audio embedding extractors

F Schmid, K Koutini, G Widmer - 2023 31st European Signal …, 2023 - ieeexplore.ieee.org
Solving tasks such as speaker recognition, music classification, or semantic audio event
tagging with deep learning models typically requires computationally demanding networks …

[PDF][PDF] Cp-jku's submission to task 6b of the dcase2023 challenge: Audio retrieval with passt and gpt-augmented captions

P Primus, K Koutini, G Widmer - 2023 - dcase.community
This technical report describes CP-JKU's submission to the naturallanguage-based audio
retrieval task of the 2023 DCASE Challenge (Task 6b). Our proposed system uses …

Embedding Compression for Teacher-to-Student Knowledge Transfer

Y Ding, A Lerch - arXiv preprint arXiv:2402.06761, 2024 - arxiv.org
Common knowledge distillation methods require the teacher model and the student model
to be trained on the same task. However, the usage of embeddings as teachers has also …

Inductive Bias in Learning General Audio Representations/submitted by Khaled Koutini

K Koutini - 2022 - epub.jku.at
Abstract Machine auditory perception is a critical component in the development of artificial
intelligence systems capable of comprehending their surroundings. Perceiving and …