Pose-based contrastive learning for domain agnostic activity representations

D Schneider, S Sarfraz, A Roitberg… - Proceedings of the …, 2022 - openaccess.thecvf.com
Proceedings of the IEEE/CVF Conference on Computer Vision and …, 2022openaccess.thecvf.com
While recognition accuracies of video classification models trained on conventional
benchmarks are gradually saturating, recent studies raise alarm about the learned
representations not generalizing well across different domains. Learning abstract concepts
behind an activity instead of overfitting to the appearances and biases of a specific
benchmark domain is vital for building generalizable behaviour understanding models. In
this paper, we introduce Pose-based High Level View Contrasting (P-HLVC), a novel …
Abstract
While recognition accuracies of video classification models trained on conventional benchmarks are gradually saturating, recent studies raise alarm about the learned representations not generalizing well across different domains. Learning abstract concepts behind an activity instead of overfitting to the appearances and biases of a specific benchmark domain is vital for building generalizable behaviour understanding models. In this paper, we introduce Pose-based High Level View Contrasting (P-HLVC), a novel method that leverages human pose dynamics as a supervision signal aimed at learning domain-invariant activity representations. Our model learns to link image sequences to more abstract body pose information through iterative contrastive clustering and the Sinkhorn-Knopp algorithm, providing us with video representations more resistant to domain shifts. We demonstrate the effectiveness of our approach in a cross-domain action recognition setting and achieve significant improvements on the synthetic-to-real Sims4Action benchmark.
openaccess.thecvf.com
以上显示的是最相近的搜索结果。 查看全部搜索结果