Weakly supervised audio-visual violence detection

P Wu, X Liu, J Liu - IEEE Transactions on Multimedia, 2022 - ieeexplore.ieee.org
IEEE Transactions on Multimedia, 2022ieeexplore.ieee.org
Violence detection in videos is very promising in practical applications due to the
emergence of massive videos in recent years. Most previous works define violence
detection as a simple video classification task and use the single modality of small-scale
datasets, eg, visual signal. However, such solutions are undersupplied. To mitigate this
problem, we study weakly supervised violence detection on the large-scale audio-visual
violence data, and first introduce two complementary tasks, ie, coarse-grained violent frame …
Violence detection in videos is very promising in practical applications due to the emergence of massive videos in recent years. Most previous works define violence detection as a simple video classification task and use the single modality of small-scale datasets, e.g., visual signal. However, such solutions are undersupplied. To mitigate this problem, we study weakly supervised violence detection on the large-scale audio-visual violence data, and first introduce two complementary tasks, i.e., coarse-grained violent frame detection and fine-grained violent event detection, to advance the simple violence video classification to frame-level violent event localization, which aims to accurately locate the violent events on untrimmed videos. We then propose a novel network that takes as input audio-visual data and contains three parallel branches to capture different relationships among video snippets and further integrate features, where similarity branch and proximity branch capture long-range dependencies using similarity prior and proximity prior, respectively, and score branch dynamically captures the closeness of predicted score. In both coarse-grained and fine-grained tasks, our approach outperforms other state-of-the-art approaches on two public datasets. Moreover, experiment results also show the positive effect of audio-visual input and relationship modeling.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果