Vitpose: Simple vision transformer baselines for human pose estimation

Y Xu, J Zhang, Q Zhang, D Tao - Advances in Neural …, 2022 - proceedings.neurips.cc
Although no specific domain knowledge is considered in the design, plain vision
transformers have shown excellent performance in visual recognition tasks. However, little …

Instructdiffusion: A generalist modeling interface for vision tasks

Z Geng, B Yang, T Hang, C Li, S Gu… - Proceedings of the …, 2024 - openaccess.thecvf.com
We present InstructDiffusion a unified and generic framework for aligning computer vision
tasks with human instructions. Unlike existing approaches that integrate prior knowledge …

Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond

Q Zhang, Y Xu, J Zhang, D Tao - International Journal of Computer Vision, 2023 - Springer
Vision transformers have shown great potential in various computer vision tasks owing to
their strong capability to model long-range dependency using the self-attention mechanism …

Animal pose estimation: A closer look at the state-of-the-art, existing gaps and opportunities

L Jiang, C Lee, D Teotia, S Ostadabbas - Computer Vision and Image …, 2022 - Elsevier
Over the past few years, research on animal pose estimation in computer vision field has
grown in many aspects such as 2D and 3D pose estimation, 3D mesh reconstruction, and …

Rtmpose: Real-time multi-person pose estimation based on mmpose

T Jiang, P Lu, L Zhang, N Ma, R Han, C Lyu… - arXiv preprint arXiv …, 2023 - arxiv.org
Recent studies on 2D pose estimation have achieved excellent performance on public
benchmarks, yet its application in the industrial community still suffers from heavy model …

Animal kingdom: A large and diverse dataset for animal behavior understanding

XL Ng, KE Ong, Q Zheng, Y Ni… - Proceedings of the …, 2022 - openaccess.thecvf.com
Understanding animals' behaviors is significant for a wide range of applications. However,
existing animal behavior datasets have limitations in multiple aspects, including limited …

Human pose estimation using deep learning: review, methodologies, progress and future research directions

P Kumar, S Chauhan, LK Awasthi - International Journal of Multimedia …, 2022 - Springer
Human pose estimation (HPE) has developed over the past decade into a vibrant field for
research with a variety of real-world applications like 3D reconstruction, virtual testing and re …

Animal3d: A comprehensive dataset of 3d animal pose and shape

J Xu, Y Zhang, J Peng, W Ma… - Proceedings of the …, 2023 - openaccess.thecvf.com
Accurately estimating the 3D pose and shape is an essential step towards understanding
animal behavior, and can potentially benefit many downstream applications, such as wildlife …

Mmt-bench: A comprehensive multimodal benchmark for evaluating large vision-language models towards multitask agi

K Ying, F Meng, J Wang, Z Li, H Lin, Y Yang… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Vision-Language Models (LVLMs) show significant strides in general-purpose
multimodal applications such as visual dialogue and embodied navigation. However …

Pose for everything: Towards category-agnostic pose estimation

L Xu, S Jin, W Zeng, W Liu, C Qian, W Ouyang… - European conference on …, 2022 - Springer
Existing works on 2D pose estimation mainly focus on a certain category, eg human, animal,
and vehicle. However, there are lots of application scenarios that require detecting the …