Multimodal saliency-based attention for object-based scene analysis

Y Wei, D Hu, Y Tian, X Li - arXiv preprint arXiv:2208.09579, 2022 - arxiv.org

Sight and hearing are two senses that play a vital role in human communication and scene
understanding. To mimic human perception ability, audio-visual learning, aimed at …

被引用次数：62 相关文章所有 2 个版本

[PDF] ieee.org

A comprehensive survey on video saliency detection with auditory information: the audio-visual consistency perceptual is the key!

C Chen, M Song, W Song, L Guo… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Video saliency detection (VSD) aims at fast locating the most attractive
objects/things/patterns in a given video clip. Existing VSD-related works have mainly relied …

被引用次数：26 相关文章所有 5 个版本

[PDF] arxiv.org

Vinet: Pushing the limits of visual modality for audio-visual saliency prediction

S Jain, P Yarlagadda, S Jyoti, S Karthik… - 2021 IEEE/RSJ …, 2021 - ieeexplore.ieee.org

We propose the ViNet architecture for audio-visual saliency prediction. ViNet is a fully
convolutional encoder-decoder architecture. The encoder uses visual features from a …

被引用次数：86 相关文章所有 13 个版本

[PDF] thecvf.com

Stavis: Spatio-temporal audiovisual saliency network

A Tsiami, P Koutras, P Maragos - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com

We introduce STAViS, a spatio-temporal audiovisual saliency network that combines spatio-
temporal visual and auditory information in order to efficiently address the problem of …

被引用次数：85 相关文章所有 12 个版本

[HTML] nih.gov

BiconNet: An edge-preserved connectivity-based approach for salient object detection

Z Yang, S Soltanian-Zadeh, S Farsiu - Pattern recognition, 2022 - Elsevier

Salient object detection (SOD) is viewed as a pixel-wise saliency modeling task by
traditional deep learning-based methods. A limitation of current SOD models is insufficient …

被引用次数：59 相关文章所有 8 个版本

[PDF] researchgate.net

Quaternion-based spectral saliency detection for eye fixation prediction

B Schauerte, R Stiefelhagen - … ECCV 2012: 12th European Conference on …, 2012 - Springer

In recent years, several authors have reported that spectral saliency detection methods
provide state-of-the-art performance in predicting human gaze in images (see, eg,[1–3]). We …

被引用次数：183 相关文章所有 9 个版本

[PDF] arxiv.org

Listen to look into the future: Audio-visual egocentric gaze anticipation

B Lai, F Ryan, W Jia, M Liu, JM Rehg - European Conference on Computer …, 2025 - Springer

Egocentric gaze anticipation serves as a key building block for the emerging capability of
Augmented Reality. Notably, gaze behavior is driven by both visual cues and audio signals …

被引用次数：5 相关文章所有 2 个版本

[PDF] googleapis.com

Attention estimation to control the delivery of data and audio/video content

ML Needham, KL Baum, F Ishtiaq, R Li… - US Patent …, 2017 - Google Patents

A method implemented in a computer system for controlling the delivery of data and
audio/video content. The method delivers primary content to the subscriber device for …

被引用次数：72 相关文章所有 4 个版本

[PDF] google.com

A novel lightweight audio-visual saliency model for videos

D Zhu, X Shao, Q Zhou, X Min, G Zhai… - ACM Transactions on …, 2023 - dl.acm.org

Audio information has not been considered an important factor in visual attention models
regardless of many psychological studies that have shown the importance of audio …

被引用次数：11 相关文章所有 2 个版本

[PDF] ieee.org

The Visual Saliency Transformer Goes Temporal: TempVST for Video Saliency Prediction

N Lazaridis, K Georgiadis, F Kalaganis… - IEEE …, 2024 - ieeexplore.ieee.org

The Transformer revolutionized Natural Language Processing and Computer Vision by
effectively capturing contextual relationships in sequential data through its attention …

被引用次数：1 相关文章所有 3 个版本

高级搜索

QQ 群