UCF-STAR: A large scale still image dataset for understanding human actions

M Safaei, P Balouchian, H Foroosh - … of the AAAI conference on artificial …, 2020 - ojs.aaai.org
Proceedings of the AAAI conference on artificial intelligence, 2020ojs.aaai.org
Action recognition in still images poses a great challenge due to (i) fewer available training
data,(ii) absence of temporal information. To address the first challenge, we introduce a
dataset for STill image Action Recognition (STAR), containing over $1 M $ images across 50
different human body-motion action categories. UCF-STAR is the largest dataset in the
literature for action recognition in still images. The key characteristics of UCF-STAR include
(1) focusing on human body-motion rather than relatively static human-object interaction …
Abstract
Action recognition in still images poses a great challenge due to (i) fewer available training data,(ii) absence of temporal information. To address the first challenge, we introduce a dataset for STill image Action Recognition (STAR), containing over images across 50 different human body-motion action categories. UCF-STAR is the largest dataset in the literature for action recognition in still images. The key characteristics of UCF-STAR include (1) focusing on human body-motion rather than relatively static human-object interaction categories,(2) collecting images from the wild to benefit from a varied set of action representations,(3) appending multiple human-annotated labels per image rather than just the action label, and (4) inclusion of rich, structured and multi-modal set of metadata for each image. This departs from existing datasets, which typically provide single annotation in a smaller number of images and categories, with no metadata. UCF-STAR exposes the intrinsic difficulty of action recognition through its realistic scene and action complexity. To benchmark and demonstrate the benefits of UCF-STAR as a large-scale dataset, and to show the role of “latent” motion information in recognizing human actions in still images, we present a novel approach relying on predicting temporal information, yielding higher accuracy on 5 widely-used datasets.
ojs.aaai.org
以上显示的是最相近的搜索结果。 查看全部搜索结果