Learning to classify video data from classes not included in the training data, ie video-based zero-shot learning, is challenging. We conjecture that the natural alignment between the …
Current visual generation methods can produce high-quality videos guided by text prompts. However, effectively controlling object dynamics remains a challenge. This work explores …
J Zhou, X Shen, J Wang, J Zhang, W Sun… - International Journal of …, 2024 - Springer
We propose a new problem called audio-visual segmentation (AVS), in which the goal is to output a pixel-level map of the object (s) that produce sound at the time of the image frame …
Abstract Video-to-audio (V2A) generation leverages visual-only video features to render plausible sounds that match the scene. Importantly, the generated sound onsets should …
We study Neural Foley, the automatic generation of high-quality sound effects synchronizing with videos, enabling an immersive audio-visual experience. Despite its wide range of …
A Rahimi, T Afouras… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
The goal of this paper is speech separation and enhancement in multi-speaker and noisy environments using a combination of different modalities. Previous works have shown good …
Speech sounds convey a great deal of information about the scenes, resulting in a variety of effects ranging from reverberation to additional ambient sounds. In this paper, we …
VS Kadandale, JF Montesinos, G Haro - arXiv preprint arXiv:2204.02090, 2022 - arxiv.org
In this paper, we address the problem of lip-voice synchronisation in videos containing human face and voice. Our approach is based on determining if the lips motion and the …
The objective of this paper is audio-visual synchronisation of general videos' in the wild'. For such videos, the events that may be harnessed for synchronisation cues may be spatially …