P Kumar, P Rawat, S Chauhan - International Journal of Multimedia …, 2022 - Springer
In the last decade, deep supervised learning has had tremendous success. However, its flaws, such as its dependency on manual and costly annotations on large datasets and …
The massive growth of self-supervised learning (SSL) has been witnessed in language, vision, speech, and audio domains over the past few years. While discrete label prediction is …
Deep learning architectures usually require large scale labeled datasets for achieving good performance on general classification tasks including computer vision and natural language …
While deep learning has enabled great advances in many areas of music, labeled music datasets remain especially hard, expensive, and time-consuming to create. In this work, we …
P Lauha, P Somervuo, P Lehikoinen… - Methods in Ecology …, 2022 - Wiley Online Library
An automatic bird sound recognition system is a useful tool for collecting data of different bird species for ecological analysis. Together with autonomous recording units (ARUs), such …
As one of the most intuitive interfaces known to humans, natural language has the potential to mediate many tasks that involve human-computer interaction, especially in application …
In recent years, foundation models (FMs) such as large language models (LLMs) and latent diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …
Abstract visual reasoning (AVR) domain encompasses problems solving which requires the ability to reason about relations among entities present in a given scene. While humans …
S Atito, M Awais, W Wang… - … /ACM Transactions on …, 2024 - ieeexplore.ieee.org
Transformers, which were originally developed for natural language processing, have recently generated significant interest in the computer vision and audio communities due to …