The epic-kitchens dataset: Collection, challenges and baselines

H Zhu, H Wei, B Li, X Yuan, N Kehtarnavaz - Applied Sciences, 2020 - mdpi.com

Although there are well established object detection methods based on static images, their
application to video data on a frame by frame basis faces two shortcomings:(i) lack of …

被引用次数：122 相关文章所有 10 个版本

[PDF] gla.ac.uk

Beyond supervised learning for pervasive healthcare

X Gu, F Deligianni, J Han, X Liu, W Chen… - IEEE Reviews in …, 2023 - ieeexplore.ieee.org

The integration of machine/deep learning and sensing technologies is transforming
healthcare and medical practice. However, inherent limitations in healthcare data, namely …

被引用次数：11 相关文章所有 4 个版本

[PDF] thecvf.com

Ego4d: Around the world in 3,000 hours of egocentric video

K Grauman, A Westbury, E Byrne… - Proceedings of the …, 2022 - openaccess.thecvf.com

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …

被引用次数：805 相关文章所有 13 个版本

[PDF] thecvf.com

Pointodyssey: A large-scale synthetic dataset for long-term point tracking

Y Zheng, AW Harley, B Shen… - Proceedings of the …, 2023 - openaccess.thecvf.com

We introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework,
for the training and evaluation of long-term fine-grained tracking algorithms. Our goal is to …

被引用次数：69 相关文章所有 5 个版本

[PDF] thecvf.com

Cafe: Learning to condense dataset by aligning features

K Wang, B Zhao, X Peng, Z Zhu… - Proceedings of the …, 2022 - openaccess.thecvf.com

Dataset condensation aims at reducing the network training effort through condensing a
cumbersome training set into a compact synthetic one. State-of-the-art approaches largely …

被引用次数：199 相关文章所有 8 个版本

[PDF] thecvf.com

Memvit: Memory-augmented multiscale vision transformer for efficient long-term video recognition

CY Wu, Y Li, K Mangalam, H Fan… - Proceedings of the …, 2022 - openaccess.thecvf.com

While today's video recognition systems parse snapshots or short clips accurately, they
cannot connect the dots and reason across a longer range of time yet. Most existing video …

被引用次数：198 相关文章所有 5 个版本

[PDF] arxiv.org

A comprehensive study of deep video action recognition

Y Zhu, X Li, C Liu, M Zolfaghari, Y Xiong, C Wu… - arXiv preprint arXiv …, 2020 - arxiv.org

Video action recognition is one of the representative tasks for video understanding. Over the
last decade, we have witnessed great advancements in video action recognition thanks to …

被引用次数：212 相关文章所有 2 个版本

[PDF] thecvf.com

Hybrid relation guided set matching for few-shot action recognition

X Wang, S Zhang, Z Qing, M Tang… - Proceedings of the …, 2022 - openaccess.thecvf.com

Current few-shot action recognition methods reach impressive performance by learning
discriminative features for each video via episodic training and designing various temporal …

被引用次数：91 相关文章所有 6 个版本

[PDF] thecvf.com

Molo: Motion-augmented long-short contrastive learning for few-shot action recognition

X Wang, S Zhang, Z Qing, C Gao… - Proceedings of the …, 2023 - openaccess.thecvf.com

Current state-of-the-art approaches for few-shot action recognition achieve promising
performance by conducting frame-level matching on learned visual features. However, they …

被引用次数：52 相关文章所有 6 个版本

[PDF] thecvf.com

H2o: Two hands manipulating objects for first person interaction recognition

T Kwon, B Tekin, J Stühmer, F Bogo… - Proceedings of the …, 2021 - openaccess.thecvf.com

We present a comprehensive framework for egocentric interaction recognition using
markerless 3D annotations of two hands manipulating objects. To this end, we propose a …

被引用次数：150 相关文章所有 6 个版本

高级搜索

QQ 群