Panns: Large-scale pretrained audio neural networks for audio pattern recognition

X Mei, X Liu, MD Plumbley, W Wang - … journal on audio, speech, and music …, 2022 - Springer

Automated audio captioning is a cross-modal translation task that aims to generate natural
language descriptions for given audio clips. This task has received increasing attention with …

被引用次数：42 相关文章所有 11 个版本

Automatic design of machine learning via evolutionary computation: A survey

N Li, L Ma, T Xing, G Yu, C Wang, Y Wen, S Cheng… - Applied Soft …, 2023 - Elsevier

Abstract Machine learning (ML), as the most promising paradigm to discover deep
knowledge from data, has been widely applied to practical applications, such as …

被引用次数：21 相关文章所有 3 个版本

[PDF] arxiv.org

Audioldm: Text-to-audio generation with latent diffusion models

H Liu, Z Chen, Y Yuan, X Mei, X Liu, D Mandic… - arXiv preprint arXiv …, 2023 - arxiv.org

Text-to-audio (TTA) system has recently gained attention for its ability to synthesize general
audio based on text descriptions. However, previous studies in TTA have limited generation …

被引用次数：305 相关文章所有 7 个版本

[PDF] arxiv.org

Clap learning audio concepts from natural language supervision

B Elizalde, S Deshmukh, M Al Ismail… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Mainstream machine listening models are trained to learn audio concepts under the
paradigm of one class label to many recordings focusing on one task. Learning under such …

被引用次数：246 相关文章所有 3 个版本

[PDF] emerald.com

On the use of AI-based tools like ChatGPT to support management research

B Burger, DK Kanbach, S Kraus, M Breier… - European Journal of …, 2023 - emerald.com

Purpose The article discusses the current relevance of artificial intelligence (AI) in research
and how AI improves various research methods. This article focuses on the practical case …

被引用次数：153 相关文章所有 12 个版本

[PDF] arxiv.org

Large-scale contrastive language-audio pretraining with feature fusion and keyword-to-caption augmentation

Y Wu, K Chen, T Zhang, Y Hui… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Contrastive learning has shown remarkable success in the field of multimodal
representation learning. In this paper, we propose a pipeline of contrastive language-audio …

被引用次数：262 相关文章所有 5 个版本

[PDF] neurips.cc

Masked autoencoders that listen

PY Huang, H Xu, J Li, A Baevski… - Advances in …, 2022 - proceedings.neurips.cc

This paper studies a simple extension of image-based Masked Autoencoders (MAE) to self-
supervised representation learning from audio spectrograms. Following the Transformer …

被引用次数：169 相关文章所有 5 个版本

[PDF] arxiv.org

Dawn of the transformer era in speech emotion recognition: closing the valence gap

J Wagner, A Triantafyllopoulos… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org

Recent advances in transformer-based architectures have shown promise in several
machine learning tasks. In the audio domain, such architectures have been successfully …

被引用次数：220 相关文章所有 8 个版本

[PDF] mit.edu

Ast: Audio spectrogram transformer

Y Gong, YA Chung, J Glass - arXiv preprint arXiv:2104.01778, 2021 - arxiv.org

In the past decade, convolutional neural networks (CNNs) have been widely adopted as the
main building block for end-to-end audio classification models, which aim to learn a direct …

被引用次数：813 相关文章所有 9 个版本

[PDF] neurips.cc

Vatt: Transformers for multimodal self-supervised learning from raw video, audio and text

H Akbari, L Yuan, R Qian… - Advances in …, 2021 - proceedings.neurips.cc

We present a framework for learning multimodal representations from unlabeled data using
convolution-free Transformer architectures. Specifically, our Video-Audio-Text Transformer …

被引用次数：565 相关文章所有 9 个版本

高级搜索

QQ 群