Food is essential for human life and it is fundamental to the human experience. Food-related study may support multifarious applications and services, such as guiding human behavior …
Human activity recognition (HAR) systems attempt to automatically identify and analyze human activities using acquired information from various types of sensors. Although several …
R Girdhar, K Grauman - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
Abstract We propose Anticipative Video Transformer (AVT), an end-to-end attention-based video modeling architecture that attends to the previously observed video in order to …
This paper introduces the pipeline to extend the largest dataset in egocentric vision, EPIC- KITCHENS. The effort culminates in EPIC-KITCHENS-100, a collection of 100 hours, 20M …
Temporal action segmentation is crucial for understanding long-form videos. Previous works on this task commonly adopt an iterative refinement paradigm by using multi-stage models …
F Yi, H Wen, T Jiang - arXiv preprint arXiv:2110.08568, 2021 - arxiv.org
Algorithms for the action segmentation task typically use temporal models to predict what action is occurring at each frame for a minute-long daily activity. Recent studies have shown …
YA Farha, J Gall - Proceedings of the IEEE/CVF conference …, 2019 - openaccess.thecvf.com
Temporally locating and classifying action segments in long untrimmed videos is of particular interest to many applications like surveillance and robotics. While traditional …
This paper introduces a unified framework for video action segmentation via sequence to sequence (seq2seq) translation in a fully and timestamp supervised setup. In contrast to …
A key requirement for leveraging supervised deep learning methods is the availability of large, labeled datasets. Unfortunately, in the context of RGB-D scene understanding, very …