Audio set classification with attention model: A probabilistic perspective

A Jaegle, F Gimeno, A Brock… - International …, 2021 - proceedings.mlr.press

Biological systems understand the world by simultaneously processing high-dimensional
inputs from modalities as diverse as vision, audition, touch, proprioception, etc. The …

被引用次数：880 相关文章所有 7 个版本

[PDF] surrey.ac.uk

Panns: Large-scale pretrained audio neural networks for audio pattern recognition

Q Kong, Y Cao, T Iqbal, Y Wang… - … on Audio, Speech …, 2020 - ieeexplore.ieee.org

Audio pattern recognition is an important research topic in the machine learning area, and
includes several tasks such as audio tagging, acoustic scene classification, music …

被引用次数：1080 相关文章所有 8 个版本

[PDF] arxiv.org

Listen, think, and understand

Y Gong, H Luo, AH Liu, L Karlinsky, J Glass - arXiv preprint arXiv …, 2023 - arxiv.org

The ability of artificial intelligence (AI) systems to perceive and comprehend audio signals is
crucial for many applications. Although significant progress has been made in this area …

被引用次数：77 相关文章所有 6 个版本

[PDF] researchgate.net

Attention, please! A survey of neural attention models in deep learning

A de Santana Correia, EL Colombini - Artificial Intelligence Review, 2022 - Springer

In humans, Attention is a core property of all perceptual and cognitive operations. Given our
limited ability to process competing sources, attention mechanisms select, modulate, and …

被引用次数：184 相关文章所有 8 个版本

[PDF] arxiv.org

Psla: Improving audio tagging with pretraining, sampling, labeling, and aggregation

Y Gong, YA Chung, J Glass - IEEE/ACM Transactions on Audio …, 2021 - ieeexplore.ieee.org

Audio tagging is an active research area and has a wide range of applications. Since the
release of AudioSet, great progress has been made in advancing model performance, which …

被引用次数：161 相关文章所有 6 个版本

[PDF] arxiv.org

Unified multisensory perception: Weakly-supervised audio-visual video parsing

Y Tian, D Li, C Xu - Computer Vision–ECCV 2020: 16th European …, 2020 - Springer

In this paper, we introduce a new problem, named audio-visual video parsing, which aims to
parse a video into temporal event segments and label them as either audible, visible, or …

被引用次数：170 相关文章所有 9 个版本

[PDF] arxiv.org

Audio retrieval with natural language queries: A benchmark study

AS Koepke, AM Oncescu, JF Henriques… - IEEE Transactions …, 2022 - ieeexplore.ieee.org

The objectives of this work are cross-modal text-audio and audio-text retrieval, in which the
goal is to retrieve the audio content from a pool of candidates that best matches a given …

被引用次数：93 相关文章所有 10 个版本

[PDF] cmu.edu

A comparison of five multiple instance learning pooling functions for sound event detection with weak labeling

Y Wang, J Li, F Metze - ICASSP 2019-2019 IEEE International …, 2019 - ieeexplore.ieee.org

Sound event detection (SED) entails two subtasks: recognizing what types of sound events
are present in an audio stream (audio tagging), and pinpointing their onset and offset times …

被引用次数：207 相关文章所有 6 个版本

[PDF] arxiv.org

Contrastive positive sample propagation along the audio-visual event line

J Zhou, D Guo, M Wang - IEEE Transactions on Pattern …, 2022 - ieeexplore.ieee.org

Visual and audio signals often coexist in natural environments, forming audio-visual events
(AVEs). Given a video, we aim to localize video segments containing an AVE and identify its …

被引用次数：35 相关文章所有 7 个版本

[PDF] arxiv.org

Audio retrieval with natural language queries

AM Oncescu, A Koepke, JF Henriques, Z Akata… - arXiv preprint arXiv …, 2021 - arxiv.org

We consider the task of retrieving audio using free-form natural language queries. To study
this problem, which has received limited attention in the existing literature, we introduce …

被引用次数：80 相关文章所有 13 个版本

高级搜索

QQ 群