This work proposes a unified framework called UniPose to detect keypoints of any articulated (eg, human and animal), rigid, and soft objects via visual or textual prompts for …
Q Fang, Y Fan, Y Li, J Dong, D Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com
In this paper we focus on capturing closely interacted two-person motions from monocular videos an important yet understudied topic. Unlike less-interacted motions closely interacted …
L Zhou, X Meng, Z Liu, M Wu, Z Gao… - arXiv preprint arXiv …, 2023 - arxiv.org
Human pose analysis has garnered significant attention within both the research community and practical applications, owing to its expanding array of uses, including gaming, video …
J Jeong, D Park, KJ Yoon - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Human pose forecasting garners attention for its diverse applications. However challenges in modeling the multi-modal nature of human motion and intricate interactions among agents …
D Tan, H Chen, W Tian, L Xiong - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
This paper presents the DiffusionRegPose a novel approach to multi-person pose estimation that converts a one-stage end-to-end keypoint regression model into a diffusion …
Y Luo, S Cui, Z Li - arXiv preprint arXiv:2406.16072, 2024 - arxiv.org
Accurate 3D lane estimation is crucial for ensuring safety in autonomous driving. However, prevailing monocular techniques suffer from depth loss and lighting variations, hampering …
Tracking the articulated poses of multiple individuals in complex videos is a highly challenging task due to a variety of factors that compromise the accuracy of estimation and …
Y Dang, J Yin, L Liu, P Ding, Y Sun, Y Hu - Knowledge-Based Systems, 2024 - Elsevier
Multi-person pose estimation (MPPE) presents a challenging yet crucial task in computer vision. Most existing methods predominantly concentrate on isolated interaction either …
J Wu, M Zhong, S Xing, Z Lai, Z Liu, W Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
We present VisionLLM v2, an end-to-end generalist multimodal large model (MLLM) that unifies visual perception, understanding, and generation within a single framework. Unlike …