This work presents a text-to-audio-retrieval system based on pre-trained text and spectrogram transformers. Our method projects recordings and textual descriptions into a …
Solving tasks such as speaker recognition, music classification, or semantic audio event tagging with deep learning models typically requires computationally demanding networks …
This technical report describes CP-JKU's submission to the naturallanguage-based audio retrieval task of the 2023 DCASE Challenge (Task 6b). Our proposed system uses …
Y Ding, A Lerch - arXiv preprint arXiv:2402.06761, 2024 - arxiv.org
Common knowledge distillation methods require the teacher model and the student model to be trained on the same task. However, the usage of embeddings as teachers has also …
Abstract Machine auditory perception is a critical component in the development of artificial intelligence systems capable of comprehending their surroundings. Perceiving and …