Weakly labelled audioset tagging with attention neural networks

X Mei, X Liu, MD Plumbley, W Wang - … journal on audio, speech, and music …, 2022 - Springer

Automated audio captioning is a cross-modal translation task that aims to generate natural
language descriptions for given audio clips. This task has received increasing attention with …

被引用次数：48 相关文章所有 11 个版本

[PDF] neurips.cc

Vatt: Transformers for multimodal self-supervised learning from raw video, audio and text

H Akbari, L Yuan, R Qian… - Advances in …, 2021 - proceedings.neurips.cc

We present a framework for learning multimodal representations from unlabeled data using
convolution-free Transformer architectures. Specifically, our Video-Audio-Text Transformer …

被引用次数：615 相关文章所有 9 个版本

[PDF] arxiv.org

Wavcaps: A chatgpt-assisted weakly-labelled audio captioning dataset for audio-language multimodal research

X Mei, C Meng, H Liu, Q Kong, T Ko… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org

The advancement of audio-language (AL) multimodal learning tasks has been significant in
recent years, yet the limited size of existing audio-language datasets poses challenges for …

被引用次数：105 相关文章所有 3 个版本

[PDF] surrey.ac.uk

Panns: Large-scale pretrained audio neural networks for audio pattern recognition

Q Kong, Y Cao, T Iqbal, Y Wang… - … on Audio, Speech …, 2020 - ieeexplore.ieee.org

Audio pattern recognition is an important research topic in the machine learning area, and
includes several tasks such as audio tagging, acoustic scene classification, music …

被引用次数：1137 相关文章所有 8 个版本

[PDF] arxiv.org

Listen, think, and understand

Y Gong, H Luo, AH Liu, L Karlinsky, J Glass - arXiv preprint arXiv …, 2023 - arxiv.org

The ability of artificial intelligence (AI) systems to perceive and comprehend audio signals is
crucial for many applications. Although significant progress has been made in this area …

被引用次数：91 相关文章所有 6 个版本

[PDF] hal.science

The internet of audio things: State of the art, vision, and challenges

L Turchet, G Fazekas, M Lagrange… - IEEE internet of …, 2020 - ieeexplore.ieee.org

The Internet of Audio Things (IoAuT) is an emerging research field positioned at the
intersection of the Internet of Things, sound and music computing, artificial intelligence, and …

被引用次数：70 相关文章所有 8 个版本

[PDF] arxiv.org

Psla: Improving audio tagging with pretraining, sampling, labeling, and aggregation

Y Gong, YA Chung, J Glass - IEEE/ACM Transactions on Audio …, 2021 - ieeexplore.ieee.org

Audio tagging is an active research area and has a wide range of applications. Since the
release of AudioSet, great progress has been made in advancing model performance, which …

被引用次数：166 相关文章所有 6 个版本

[PDF] arxiv.org

Audio retrieval with natural language queries: A benchmark study

AS Koepke, AM Oncescu, JF Henriques… - IEEE Transactions …, 2022 - ieeexplore.ieee.org

The objectives of this work are cross-modal text-audio and audio-text retrieval, in which the
goal is to retrieve the audio content from a pool of candidates that best matches a given …

被引用次数：99 相关文章所有 10 个版本

[PDF] arxiv.org

Sound event detection of weakly labelled data with cnn-transformer and automatic threshold optimization

Q Kong, Y Xu, W Wang… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org

Sound event detection (SED) is a task to detect sound events in an audio recording. One
challenge of the SED task is that many datasets such as the Detection and Classification of …

被引用次数：141 相关文章所有 8 个版本

[PDF] arxiv.org

Audio captioning transformer

X Mei, X Liu, Q Huang, MD Plumbley… - arXiv preprint arXiv …, 2021 - arxiv.org

Audio captioning aims to automatically generate a natural language description of an audio
clip. Most captioning models follow an encoder-decoder architecture, where the decoder …

被引用次数：79 相关文章所有 9 个版本

高级搜索

QQ 群