J Zhou,
L Zheng,
Y Zhong, S Hao… - Proceedings of the …, 2021 - openaccess.thecvf.com
Visual and audio signals often coexist in natural environments, forming audio-visual events
(AVEs). Given a video, we aim to localize video segments containing an AVE and identify its …