查看文章

Vitpose++: Vision transformer for generic body pose estimation

作者

Yufei Xu, Jing Zhang, Qiming Zhang, Dacheng Tao

发表日期

2023/11/3

期刊

IEEE Transactions on Pattern Analysis and Machine Intelligence

出版商

IEEE

简介

In this paper, we show the surprisingly good properties of plain vision transformers for body pose estimation from various aspects, namely simplicity in model structure, scalability in model size, flexibility in training paradigm, and transferability of knowledge between models, through a simple baseline model dubbed ViTPose. ViTPose employs the plain and non-hierarchical vision transformer as an encoder to encode features and a lightweight decoder to decode body keypoints in either a top-down or a bottom-up manner. It can be scaled to 1B parameters by taking the advantage of the scalable model capacity and high parallelism, setting a new Pareto front for throughput and performance. Besides, ViTPose is very flexible regarding the attention type, input resolution, and pre-training and fine-tuning strategy. Based on the flexibility, a novel ViTPose++ model is proposed to deal with heterogeneous body keypoint …

引用总数

被引用次数：44

2023202420 24

学术搜索中的文章

ViTPose++: Vision Transformer Foundation Model for Generic Body Pose Estimation*

Y Xu, J Zhang, Q Zhang, D Tao - arXiv preprint arXiv:2212.04246, 2022

Y Xu, J Zhang, Q Zhang, D Tao - IEEE Transactions on Pattern Analysis and Machine …, 2023

被引用次数：19 相关文章所有 7 个版本