Exploring multimodal video representation for action recognition

Z Sun, Q Ke, H Rahmani, M Bennamoun… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Human Action Recognition (HAR) aims to understand human behavior and assign a label to
each action. It has a wide range of applications, and therefore has been attracting increasing …

被引用次数：626 相关文章所有 16 个版本

[PDF] arxiv.org

Learning in audio-visual context: A review, analysis, and new perspective

Y Wei, D Hu, Y Tian, X Li - arXiv preprint arXiv:2208.09579, 2022 - arxiv.org

Sight and hearing are two senses that play a vital role in human communication and scene
understanding. To mimic human perception ability, audio-visual learning, aimed at …

被引用次数：62 相关文章所有 2 个版本

[PDF] thecvf.com

CAD-contextual multi-modal alignment for dynamic AVQA

A Nadeem, A Hilton, R Dawes… - Proceedings of the …, 2024 - openaccess.thecvf.com

In the context of Audio Visual Question Answering (AVQA) tasks, the audio and visual
modalities could be learnt on three levels: 1) Spatial, 2) Temporal, and 3) Semantic. Existing …

被引用次数：9 相关文章所有 6 个版本

[PDF] wiley.com Full View

Advances in human action recognition: an updated survey

SAR Abu‐Bakar - IET Image Processing, 2019 - Wiley Online Library

Research in human activity recognition (HAR) has seen tremendous growth and
continuously receiving attention from both the Computer Vision and the Image Processing …

被引用次数：38 相关文章所有 6 个版本

[PDF] academia.edu

Event detection in soccer videos using unsupervised learning of spatio-temporal features based on pooled spatial pyramid model

B Fakhar, H Rashidy Kanan, A Behrad - Multimedia Tools and …, 2019 - Springer

Most existing researches for semantic analysis of soccer videos benefit from special
approaches to bridge the semantic gap between low-level features and high-level events …

被引用次数：34 相关文章所有 6 个版本

[PDF] arxiv.org

A comparative analysis of decision-level fusion for multimodal driver behaviour understanding

A Roitberg, K Peng, Z Marinov… - 2022 IEEE Intelligent …, 2022 - ieeexplore.ieee.org

Visual recognition inside the vehicle cabin leads to safer driving and more intuitive human-
vehicle interaction but such systems face substantial obstacles as they need to capture …

被引用次数：16 相关文章所有 8 个版本

[PDF] researchgate.net

Deep learning based human action recognition: A survey

Z Zhang, X Ma, R Song, X Rong, X Tian… - 2017 Chinese …, 2017 - ieeexplore.ieee.org

Human action recognition has attracted much attentions because of its great potential
applications. With the rapid development of computer performance and Internet, the …

被引用次数：32 相关文章所有 2 个版本

Still image action recognition based on interactions between joints and objects

SS Ashrafi, SB Shokouhi, A Ayatollahi - Multimedia Tools and Applications, 2023 - Springer

Still image-based action recognition is a challenging area in which recognition is performed
based on only a single input image. Utilizing auxiliary information such as pose, object, or …

被引用次数：7 相关文章所有 3 个版本

Non-linear consumption of videos using a sequence of personalized multimodal fragments

G Verma, T Nalamada, K Harpavat, P Goel… - Proceedings of the 26th …, 2021 - dl.acm.org

As videos progressively take a central role in conveying information on the Web, current
linear-consumption methods that involve spending time proportional to the duration of the …

被引用次数：14 相关文章

[PDF] arxiv.org

Modselect: Automatic modality selection for synthetic-to-real domain generalization

Z Marinov, A Roitberg, D Schneider… - European Conference on …, 2022 - Springer

Modality selection is an important step when designing multimodal systems, especially in
the case of cross-domain activity recognition as certain modalities are more robust to …

被引用次数：5 相关文章所有 5 个版本

高级搜索

QQ 群