L Pham, D Ngo, PX Nguyen, T Hoang… - arXiv preprint arXiv …, 2021 - researchgate.net
This paper presents a task of audio-visual scene classification (SC) where input videos are
classified into one of five real-life crowded scenes:'Riot','Noise-Street','Firework-Event','Music …