N Li, L Ma, T Xing, G Yu, C Wang, Y Wen, S Cheng… - Applied Soft …, 2023 - Elsevier
Abstract Machine learning (ML), as the most promising paradigm to discover deep knowledge from data, has been widely applied to practical applications, such as …
Text-to-audio (TTA) system has recently gained attention for its ability to synthesize general audio based on text descriptions. However, previous studies in TTA have limited generation …
Mainstream machine listening models are trained to learn audio concepts under the paradigm of one class label to many recordings focusing on one task. Learning under such …
Purpose The article discusses the current relevance of artificial intelligence (AI) in research and how AI improves various research methods. This article focuses on the practical case …
Contrastive learning has shown remarkable success in the field of multimodal representation learning. In this paper, we propose a pipeline of contrastive language-audio …
This paper studies a simple extension of image-based Masked Autoencoders (MAE) to self- supervised representation learning from audio spectrograms. Following the Transformer …
Recent advances in transformer-based architectures have shown promise in several machine learning tasks. In the audio domain, such architectures have been successfully …
In the past decade, convolutional neural networks (CNNs) have been widely adopted as the main building block for end-to-end audio classification models, which aim to learn a direct …
We present a framework for learning multimodal representations from unlabeled data using convolution-free Transformer architectures. Specifically, our Video-Audio-Text Transformer …