From image to language: A critical analysis of visual question answering (vqa) approaches, challenges, and opportunities

MF Ishmam, MSH Shovon, MF Mridha, N Dey - Information Fusion, 2024 - Elsevier
The multimodal task of Visual Question Answering (VQA) encompassing elements of
Computer Vision (CV) and Natural Language Processing (NLP), aims to generate answers …

Learning spherical convolution for fast features from 360 imagery

YC Su, K Grauman - Advances in neural information …, 2017 - proceedings.neurips.cc
While 360 cameras offer tremendous new possibilities in vision, graphics, and augmented
reality, the spherical images they produce make core feature extraction non-trivial …

Saltinet: Scan-path prediction on 360 degree images using saliency volumes

M Assens Reina, X Giro-i-Nieto… - Proceedings of the …, 2017 - openaccess.thecvf.com
We introduce SaltiNet, a deep neural network for scanpath prediction trained on 360-degree
images. The model is based on a temporal-aware novel representation of saliency …

A spherical convolution approach for learning long term viewport prediction in 360 immersive video

C Wu, R Zhang, Z Wang, L Sun - … of the AAAI Conference on Artificial …, 2020 - ojs.aaai.org
Viewport prediction for 360 video forecasts a viewer's viewport when he/she watches a 360
video with a head-mounted display, which benefits many VR/AR applications such as 360 …

DiffSal: Joint Audio and Video Learning for Diffusion Saliency Prediction

J Xiong, P Zhang, T You, C Li… - Proceedings of the …, 2024 - openaccess.thecvf.com
Audio-visual saliency prediction can draw support from diverse modality complements but
further performance enhancement is still challenged by customized architectures as well as …

Self-view grounding given a narrated 360 video

SH Chou, YC Chen, KH Zeng, HN Hu, J Fu… - Proceedings of the AAAI …, 2018 - ojs.aaai.org
Narrated 360 videos are typically provided in many touring scenarios to mimic real-world
experience. However, previous work has shown that smart assistance (ie, providing visual …

Descriptor matching for a discrete spherical image with a convolutional neural network

Y Shan, S Li - IEEE Access, 2018 - ieeexplore.ieee.org
In this paper, we propose a method of extracting feature descriptors from discrete spherical
images using convolutional neural networks (CNNs). First, a captured full-view image is …

Scanpath and saliency prediction on 360 degree images

M Assens, X Giro-i-Nieto, K McGuinness… - Signal Processing …, 2018 - Elsevier
We introduce deep neural networks for scanpath and saliency prediction trained on 360-
degree images. The scanpath prediction model called SaltiNet is based on a temporal …

An Integrated System for Spatio-temporal Summarization of 360-Degrees Videos

I Kontostathis, E Apostolidis, V Mezaris - International Conference on …, 2024 - Springer
In this work, we present an integrated system for spatio-temporal summarization of 360-
degrees videos. The video summary production involves the detection of salient events in …

Predicting 360° Video Saliency: A ConvLSTM Encoder-Decoder Network with Spatio-temporal Consistency

Z Wan, H Qin, R Xiong, Z Li, X Fan… - IEEE Journal on …, 2024 - ieeexplore.ieee.org
360° videos have been widely used with the development of virtual reality technology and
triggered a demand to determine the most visually attractive objects in them, aka 360° video …