作者
Manuel Benavent-Lledo, Sergiu Oprea, John Alejandro Castro-Vargas, David Mulero-Perez, Jose Garcia-Rodriguez
发表日期
2022/7/18
研讨会论文
2022 International Joint Conference on Neural Networks (IJCNN)
页码范围
1-7
出版商
IEEE
简介
Egocentric videos provide a rich source of hand-object interactions that support action recognition. However, prior to action recognition, one may need to detect the presence of hands and objects in the scene. In this work, we propose an action estimation architecture based on the simultaneous detection of the hands and objects in the scene. For the hand and object detection, we have adapted well known YOLO architecture, leveraging its inference speed and accuracy. We experimentally determined the best performing architecture for our task. After obtaining the hand and object bounding boxes, we select the most likely objects to interact with, i.e., the closest objects to a hand. The rough estimation of the closest objects to a hand is a direct approach to determine hand-object interaction. After identifying the scene and alongside a set of per-object and global actions, we could determine the most suitable action we …
引用总数
学术搜索中的文章
M Benavent-Lledo, S Oprea, JA Castro-Vargas… - 2022 International Joint Conference on Neural …, 2022