Sound event detection of weakly labelled data with cnn-transformer and automatic threshold...

X Mei, X Liu, MD Plumbley, W Wang - … journal on audio, speech, and music …, 2022 - Springer

Automated audio captioning is a cross-modal translation task that aims to generate natural
language descriptions for given audio clips. This task has received increasing attention with …

被引用次数：39 相关文章所有 11 个版本

[PDF] arxiv.org

Clap learning audio concepts from natural language supervision

B Elizalde, S Deshmukh, M Al Ismail… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Mainstream machine listening models are trained to learn audio concepts under the
paradigm of one class label to many recordings focusing on one task. Learning under such …

被引用次数：227 相关文章所有 3 个版本

[PDF] arxiv.org

Ast: Audio spectrogram transformer

Y Gong, YA Chung, J Glass - arXiv preprint arXiv:2104.01778, 2021 - arxiv.org

In the past decade, convolutional neural networks (CNNs) have been widely adopted as the
main building block for end-to-end audio classification models, which aim to learn a direct …

被引用次数：780 相关文章所有 9 个版本

[PDF] arxiv.org

Listen, think, and understand

Y Gong, H Luo, AH Liu, L Karlinsky, J Glass - arXiv preprint arXiv …, 2023 - arxiv.org

The ability of artificial intelligence (AI) systems to perceive and comprehend audio signals is
crucial for many applications. Although significant progress has been made in this area …

被引用次数：64 相关文章所有 6 个版本

[PDF] ieee.org

A comprehensive review of polyphonic sound event detection

TK Chan, CS Chin - IEEE Access, 2020 - ieeexplore.ieee.org

One of the most amazing functions of the human auditory system is the ability to detect all
kinds of sound events in the environment. With the technologies and hardware advances …

被引用次数：55 相关文章所有 5 个版本

[PDF] arxiv.org

Latent variable sequential set transformers for joint multi-agent motion prediction

R Girgis, F Golemo, F Codevilla, M Weiss… - arXiv preprint arXiv …, 2021 - arxiv.org

Robust multi-agent trajectory prediction is essential for the safe control of robotic systems. A
major challenge is to efficiently learn a representation that approximates the true joint …

被引用次数：97 相关文章所有 6 个版本

[PDF] ieee.org

Strong labeling of sound events using crowdsourced weak labels and annotator competence estimation

I Martín-Morató, A Mesaros - IEEE/ACM transactions on audio …, 2023 - ieeexplore.ieee.org

Crowdsourcing is a popular tool for collecting large amounts of annotated data, but the
specific format of the strong labels necessary for sound event detection is not easily …

被引用次数：29 相关文章所有 5 个版本

[PDF] arxiv.org

Conditional sound generation using neural discrete time-frequency representation learning

X Liu, T Iqbal, J Zhao, Q Huang… - 2021 IEEE 31st …, 2021 - ieeexplore.ieee.org

Deep generative models have recently achieved impressive performance in speech and
music synthesis. However, compared to the generation of those domain-specific sounds …

被引用次数：53 相关文章所有 7 个版本

[PDF] arxiv.org

A transformer-based audio captioning model with keyword estimation

Y Koizumi, R Masumura, K Nishida, M Yasuda… - arXiv preprint arXiv …, 2020 - arxiv.org

One of the problems with automated audio captioning (AAC) is the indeterminacy in word
selection corresponding to the audio event/scene. Since one acoustic event/scene can be …

被引用次数：69 相关文章所有 8 个版本

[PDF] arxiv.org

Cmkd: Cnn/transformer-based cross-model knowledge distillation for audio classification

Y Gong, S Khurana, A Rouditchenko… - arXiv preprint arXiv …, 2022 - arxiv.org

Audio classification is an active research area with a wide range of applications. Over the
past decade, convolutional neural networks (CNNs) have been the de-facto standard …

被引用次数：30 相关文章所有 2 个版本

高级搜索

QQ 群