Motion informed audio source separation

D Michelsanti, ZH Tan, SX Zhang, Y Xu… - … on Audio, Speech …, 2021 - ieeexplore.ieee.org

Speech enhancement and speech separation are two related tasks, whose purpose is to
extract either one or more target speech signals, respectively, from a mixture of sounds …

被引用次数：295 相关文章所有 6 个版本

[PDF] arxiv.org

Visualvoice: Audio-visual speech separation with cross-modal consistency

R Gao, K Grauman - 2021 IEEE/CVF Conference on Computer …, 2021 - ieeexplore.ieee.org

We introduce a new approach for audio-visual speech separation. Given a video, the goal is
to extract the speech associated with a face in spite of simultaneous back-ground sounds …

被引用次数：197 相关文章所有 9 个版本

[PDF] thecvf.com

Learning to separate object sounds by watching unlabeled video

R Gao, R Feris, K Grauman - Proceedings of the European …, 2018 - openaccess.thecvf.com

Perceiving a scene most fully requires all the senses. Yet modeling how objects look and
sound is challenging: most natural scenes and events contain multiple objects, and the …

被引用次数：320 相关文章所有 14 个版本

[PDF] thecvf.com

2.5 d visual sound

R Gao, K Grauman - … of the IEEE/CVF Conference on …, 2019 - openaccess.thecvf.com

Binaural audio provides a listener with 3D sound sensation, allowing a rich perceptual
experience of the scene. However, binaural recordings are scarcely available and require …

被引用次数：248 相关文章所有 9 个版本

[PDF] thecvf.com

Co-separating sounds of visual objects

R Gao, K Grauman - Proceedings of the IEEE/CVF …, 2019 - openaccess.thecvf.com

Learning how objects sound from video is challenging, since they often heavily overlap in a
single audio channel. Current methods for visually-guided audio source separation sidestep …

被引用次数：232 相关文章所有 10 个版本

[PDF] thecvf.com

Positive sample propagation along the audio-visual event line

J Zhou, L Zheng, Y Zhong, S Hao… - Proceedings of the …, 2021 - openaccess.thecvf.com

Visual and audio signals often coexist in natural environments, forming audio-visual events
(AVEs). Given a video, we aim to localize video segments containing an AVE and identify its …

被引用次数：111 相关文章所有 7 个版本

[PDF] arxiv.org

Contrastive positive sample propagation along the audio-visual event line

J Zhou, D Guo, M Wang - IEEE Transactions on Pattern …, 2022 - ieeexplore.ieee.org

Visual and audio signals often coexist in natural environments, forming audio-visual events
(AVEs). Given a video, we aim to localize video segments containing an AVE and identify its …

被引用次数：52 相关文章所有 7 个版本

[PDF] arxiv.org

Sep-stereo: Visually guided stereophonic audio generation by associating source separation

H Zhou, X Xu, D Lin, X Wang, Z Liu - … , Glasgow, UK, August 23–28, 2020 …, 2020 - Springer

Stereophonic audio is an indispensable ingredient to enhance human auditory experience.
Recent research has explored the usage of visual information as guidance to generate …

被引用次数：93 相关文章所有 5 个版本

[PDF] ieee.org

Creating a multitrack classical music performance dataset for multimodal music analysis: Challenges, insights, and applications

B Li, X Liu, K Dinesh, Z Duan… - IEEE Transactions on …, 2018 - ieeexplore.ieee.org

We introduce a dataset for facilitating audio-visual analysis of music performances. The
dataset comprises 44 simple multi-instrument classical music pieces assembled from …

被引用次数：205 相关文章所有 8 个版本

[PDF] thecvf.com

Audio-visual speech codecs: Rethinking audio-visual speech enhancement by re-synthesis

K Yang, D Marković, S Krenn… - Proceedings of the …, 2022 - openaccess.thecvf.com

Since facial actions such as lip movements contain significant information about speech
content, it is not surprising that audio-visual speech enhancement methods are more …

被引用次数：40 相关文章所有 5 个版本

高级搜索

QQ 群