查看文章

K-centered Patch Sampling for Efficient Video Recognition

作者

Seong Hyeon Park, Jihoon Tack, Byeongho Heo, Jung-Woo Ha, Jinwoo Shin

发表日期

2022/10/23

图书

European Conference on Computer Vision

页码范围

160-176

出版商

Springer Nature Switzerland

简介

For decades, it has been a common practice to choose a subset of video frames for reducing the computational burden of a video understanding model. In this paper, we argue that this popular heuristic might be sub-optimal under recent transformer-based models. Specifically, inspired by that transformers are built upon patches of video frames, we propose to sample patches rather than frames using the greedy K-center search, i.e., the farthest patch to what has been chosen so far is sampled iteratively. We then show that a transformer trained with the selected video patches can outperform its baseline trained with the video frames sampled in the traditional way. Furthermore, by adding a certain spatiotemporal structuredness condition, the proposed K-centered patch sampling can be even applied to the recent sophisticated video transformers, boosting their performance further. We demonstrate the superiority of …

引用总数

被引用次数：11

2022202320241 6 4

学术搜索中的文章

K-centered patch sampling for efficient video recognition

SH Park, J Tack, B Heo, JW Ha, J Shin - European Conference on Computer Vision, 2022

被引用次数：11 相关文章所有 3 个版本