查看文章

A multi-scale self-supervised hypergraph contrastive learning framework for video question answering

作者

Zheng Wang, Bin Wu, Kaoru Ota, Mianxiong Dong, He Li

发表日期

2023/11/1

期刊

Neural Networks

卷号

168

页码范围

272-286

出版商

Pergamon

简介

Video question answering (VideoQA) is a challenging video understanding task that requires a comprehensive understanding of multimodal information and accurate answers to related questions. Most existing VideoQA models use Graph Neural Networks (GNN) to capture temporal–spatial interactions between objects. Despite achieving certain success, we argue that current schemes have two limitations: (i) existing graph-based methods require stacking multi-layers of GNN to capture high-order relations between objects, which inevitably introduces irrelevant noise; (ii) neglecting the unique self-supervised signals in the high-order relational structures among multiple objects that can facilitate more accurate QA. To this end, we propose a novel Multi-scale Self-supervised Hypergraph Contrastive Learning (MSHCL) framework for VideoQA. Specifically, we first segment the video from multiple temporal dimensions …

引用总数

被引用次数：1

20241

学术搜索中的文章

A multi-scale self-supervised hypergraph contrastive learning framework for video question answering

Z Wang, B Wu, K Ota, M Dong, H Li - Neural Networks, 2023

被引用次数：1 相关文章所有 4 个版本