To understand the world, we humans constantly need to relate the present to the past, and put events in context. In this paper, we enable existing video models to do the same. We …
X Wang, A Gupta - Proceedings of the European …, 2018 - openaccess.thecvf.com
How do humans recognize the action" opening a book"? We argue that there are two important cues: modeling temporal shape dynamics and modeling functional relationships …
TM Le, V Le, S Venkatesh… - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com
Video question answering (VideoQA) is challenging as it requires modeling capacity to distill dynamic visual artifacts and distant relations and to associate them with linguistic concepts …
Current methods for video analysis often extract frame-level features using pre-trained convolutional neural networks (CNNs). Such features are then aggregated over time eg, by …
Video captioning, in essential, is a complex natural process, which is affected by various uncertainties stemming from video content, subjective judgment, and so on. In this paper, we …
Future prediction, especially in long-range videos, requires reasoning from current and past observations. In this work, we address questions of temporal extent, scaling, and level of …
S Lu, X Wei, Y Li, L Wang - 2018 IEEE 16th Intl Conf on …, 2018 - ieeexplore.ieee.org
Nowadays, big data systems are being widely adopted by many domains for offering effective data solutions, such as manufacturing, healthcare, education, and media. Big data …
Our objective in this work is fine-grained classification of actions in untrimmed videos, where the actions may be temporally extended or may span only a few frames of the video. We cast …
Recently, there has been a lot of interest in building compact models for video classification which have a small memory footprint (< 1 GB). While these models are compact, they …