Toward the introduction of auditory information in dynamic visual attention models

A comprehensive survey on video saliency detection with auditory information: the audio-visual consistency perceptual is the key!

C Chen, M Song, W Song, L Guo… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Video saliency detection (VSD) aims at fast locating the most attractive
objects/things/patterns in a given video clip. Existing VSD-related works have mainly relied …

被引用次数：26 相关文章所有 5 个版本

[PDF] um.edu.mo

A multimodal saliency model for videos with high audio-visual correspondence

X Min, G Zhai, J Zhou, XP Zhang… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org

Audio information has been bypassed by most of current visual attention prediction studies.
However, sound could have influence on visual attention and such influence has been …

被引用次数：186 相关文章所有 6 个版本

[HTML] arvojournals.org

[HTML][HTML] How saliency, faces, and sound influence gaze in dynamic social scenes

A Coutrot, N Guyader - Journal of vision, 2014 - iovs.arvojournals.org

Conversation scenes are a typical example in which classical models of visual attention
dramatically fail to predict eye positions. Indeed, these models rarely consider faces as …

被引用次数：185 相关文章所有 23 个版本

[PDF] researchgate.net

Fixation prediction through multimodal analysis

X Min, G Zhai, K Gu, X Yang - ACM Transactions on Multimedia …, 2016 - dl.acm.org

In this article, we propose to predict human eye fixation through incorporating both audio
and visual cues. Traditional visual attention models generally make the utmost of stimuli's …

被引用次数：156 相关文章所有 4 个版本

Joint learning of audio–visual saliency prediction and sound source localization on multi-face videos

M Qiao, Y Liu, M Xu, X Deng, B Li, W Hu… - International Journal of …, 2024 - Springer

Visual and audio events simultaneously occur and both attract attention. However, most
existing saliency prediction works ignore the influence of audio and only consider vision …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

Predicting video saliency with object-to-motion CNN and two-layer convolutional LSTM

L Jiang, M Xu, Z Wang - arXiv preprint arXiv:1709.06316, 2017 - arxiv.org

Over the past few years, deep neural networks (DNNs) have exhibited great success in
predicting the saliency of images. However, there are few works that apply DNNs to predict …

被引用次数：86 相关文章所有 2 个版本

Gravitational laws of focus of attention

D Zanca, S Melacci, M Gori - IEEE transactions on pattern …, 2019 - ieeexplore.ieee.org

The understanding of the mechanisms behind focus of attention in a visual scene is a
problem of great interest in visual perception and computer vision. In this paper, we describe …

被引用次数：53 相关文章所有 5 个版本

[PDF] arxiv.org

Learning to predict salient faces: A novel visual-audio saliency model

Y Liu, M Qiao, M Xu, B Li, W Hu, A Borji - Computer Vision–ECCV 2020 …, 2020 - Springer

Recently, video streams have occupied a large proportion of Internet traffic, most of which
contain human faces. Hence, it is necessary to predict saliency on multiple-face videos …

被引用次数：20 相关文章所有 5 个版本

Saliency Prediction on Mobile Videos: A Fixation Mapping-Based Dataset and A Transformer Approach

S Wen, L Yang, M Xu, M Qiao, T Xu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

With the booming development of smart devices, mobile videos have drawn broad interest
when humans surf social media. Different from traditional long-form videos, mobile videos …

被引用次数：5 相关文章

[PDF] hal.science

An audiovisual attention model for natural conversation scenes

A Coutrot, N Guyader - 2014 IEEE international conference on …, 2014 - ieeexplore.ieee.org

Classical visual attention models neither consider social cues, such as faces, nor auditory
cues, such as speech. However, faces are known to capture visual attention more than any …

被引用次数：40 相关文章所有 14 个版本

高级搜索

QQ 群