A comprehensive survey on video saliency detection with auditory information: the audio-visual consistency perceptual is the key!

C Chen, M Song, W Song, L Guo… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Video saliency detection (VSD) aims at fast locating the most attractive
objects/things/patterns in a given video clip. Existing VSD-related works have mainly relied …

A multimodal saliency model for videos with high audio-visual correspondence

X Min, G Zhai, J Zhou, XP Zhang… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Audio information has been bypassed by most of current visual attention prediction studies.
However, sound could have influence on visual attention and such influence has been …

[HTML][HTML] How saliency, faces, and sound influence gaze in dynamic social scenes

A Coutrot, N Guyader - Journal of vision, 2014 - iovs.arvojournals.org
Conversation scenes are a typical example in which classical models of visual attention
dramatically fail to predict eye positions. Indeed, these models rarely consider faces as …

Fixation prediction through multimodal analysis

X Min, G Zhai, K Gu, X Yang - ACM Transactions on Multimedia …, 2016 - dl.acm.org
In this article, we propose to predict human eye fixation through incorporating both audio
and visual cues. Traditional visual attention models generally make the utmost of stimuli's …

Joint learning of audio–visual saliency prediction and sound source localization on multi-face videos

M Qiao, Y Liu, M Xu, X Deng, B Li, W Hu… - International Journal of …, 2024 - Springer
Visual and audio events simultaneously occur and both attract attention. However, most
existing saliency prediction works ignore the influence of audio and only consider vision …

Predicting video saliency with object-to-motion CNN and two-layer convolutional LSTM

L Jiang, M Xu, Z Wang - arXiv preprint arXiv:1709.06316, 2017 - arxiv.org
Over the past few years, deep neural networks (DNNs) have exhibited great success in
predicting the saliency of images. However, there are few works that apply DNNs to predict …

Gravitational laws of focus of attention

D Zanca, S Melacci, M Gori - IEEE transactions on pattern …, 2019 - ieeexplore.ieee.org
The understanding of the mechanisms behind focus of attention in a visual scene is a
problem of great interest in visual perception and computer vision. In this paper, we describe …

Learning to predict salient faces: A novel visual-audio saliency model

Y Liu, M Qiao, M Xu, B Li, W Hu, A Borji - Computer Vision–ECCV 2020 …, 2020 - Springer
Recently, video streams have occupied a large proportion of Internet traffic, most of which
contain human faces. Hence, it is necessary to predict saliency on multiple-face videos …

Saliency Prediction on Mobile Videos: A Fixation Mapping-Based Dataset and A Transformer Approach

S Wen, L Yang, M Xu, M Qiao, T Xu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
With the booming development of smart devices, mobile videos have drawn broad interest
when humans surf social media. Different from traditional long-form videos, mobile videos …

An audiovisual attention model for natural conversation scenes

A Coutrot, N Guyader - 2014 IEEE international conference on …, 2014 - ieeexplore.ieee.org
Classical visual attention models neither consider social cues, such as faces, nor auditory
cues, such as speech. However, faces are known to capture visual attention more than any …