Video question answering (VideoQA) is a complex task that requires diverse multi-modal data for training. Manual annotation of question and answers for videos, however, is tedious …
Recent methods for visual question answering rely on large-scale annotated datasets. Manual annotation of questions and answers for videos, however, is tedious, expensive and …
Building a universal video-language model for solving various video understanding tasks (eg, text-video retrieval, video question answering) is an open challenge to the machine …
G Chen, X Liu, G Wang, K Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Video-language pre-trained models have shown remarkable success in guiding video question-answering (VideoQA) tasks. However, due to the length of video sequences …
Security inspection often deals with a piece of baggage or suitcase where objects are heavily overlapped with each other, resulting in an unsatisfactory performance for prohibited …
H Wang, ZJ Zha, L Li, D Liu… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
We address the problem of localizing a specific moment described by a natural language query. Existing works interact the query with either video frame or moment proposal, and …
L Gao, Y Lei, P Zeng, J Song, M Wang… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Recently, integrating vision and language for in-depth video understanding eg, video captioning and video question answering, has become a promising direction for artificial …
W Yu, H Zheng, M Li, L Ji, L Wu… - Advances in Neural …, 2021 - proceedings.neurips.cc
Recent advances in the video question answering (ie, VideoQA) task have achieved strong success by following the paradigm of fine-tuning each clip-text pair independently on the …
Recent methods for visual question answering rely on large-scale annotated datasets. Manual annotation of questions and answers for videos, however, is tedious, expensive and …