Audio event-relational graph representation learning for acoustic scene classification

Y Hou, S Song, C Yu, W Wang… - IEEE signal …, 2023 - ieeexplore.ieee.org
IEEE signal processing letters, 2023ieeexplore.ieee.org
Most deep learning-based acoustic scene classification (ASC) approaches identify scenes
based on acoustic features converted from audio clips containing mixed information
entangled by polyphonic audio events (AEs). However, these approaches have difficulties in
explaining what cues they use to identify scenes. This letter conducts the first study on
disclosing the relationship between real-life acoustic scenes and semantic embeddings from
the most relevant AEs. Specifically, we propose an event-relational graph representation …
Most deep learning-based acoustic scene classification (ASC) approaches identify scenes based on acoustic features converted from audio clips containing mixed information entangled by polyphonic audio events (AEs). However, these approaches have difficulties in explaining what cues they use to identify scenes. This letter conducts the first study on disclosing the relationship between real-life acoustic scenes and semantic embeddings from the most relevant AEs. Specifically, we propose an event-relational graph representation learning (ERGL) framework for ASC to classify scenes, and simultaneously answer clearly and straightly which cues are used in classifying. In the event-relational graph, embeddings of each event are treated as nodes, while relationship cues derived from each pair of nodes are described by multi-dimensional edge features. Experiments on a real-life ASC dataset show that the proposed ERGL achieves competitive performance on ASC by learning embeddings of only a limited number of AEs. The results show the feasibility of recognizing diverse acoustic scenes based on the audio event-relational graph.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果