Global-local temporal representations for video person re-identification

J Li, J Wang, Q Tian, W Gao… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
Proceedings of the IEEE/CVF international conference on …, 2019openaccess.thecvf.com
This paper proposes the Global-Local Temporal Representation (GLTR) to exploit the multi-
scale temporal cues in video sequences for video person Re-Identification (ReID). GLTR is
constructed by first modeling the short-term temporal cues among adjacent frames, then
capturing the long-term relations among inconsecutive frames. Specifically, the short-term
temporal cues are modeled by parallel dilated convolutions with different temporal dilation
rates to represent the motion and appearance of pedestrian. The long-term relations are …
Abstract
This paper proposes the Global-Local Temporal Representation (GLTR) to exploit the multi-scale temporal cues in video sequences for video person Re-Identification (ReID). GLTR is constructed by first modeling the short-term temporal cues among adjacent frames, then capturing the long-term relations among inconsecutive frames. Specifically, the short-term temporal cues are modeled by parallel dilated convolutions with different temporal dilation rates to represent the motion and appearance of pedestrian. The long-term relations are captured by a temporal self-attention model to alleviate the occlusions and noises in video sequences. The short and long-term temporal cues are aggregated as the final GLTR by a simple single-stream CNN. GLTR shows substantial superiority to existing features learned with body part cues or metric learning on four widely-used video ReID datasets. For instance, it achieves Rank-1 Accuracy of 87.02% on MARS dataset without re-ranking, better than current state-of-the art.
openaccess.thecvf.com
以上显示的是最相近的搜索结果。 查看全部搜索结果