Implicit temporal modeling with learnable alignment for video recognition

S Tu, Q Dai, Z Wu, ZQ Cheng, H Hu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Contrastive language-image pretraining (CLIP) has demonstrated remarkable success in
various image tasks. However, how to extend CLIP with effective temporal modeling is still …

Posynda: Multi-hypothesis pose synthesis domain adaptation for robust 3d human pose estimation

H Liu, JY He, ZQ Cheng, W Xiang, Q Yang… - Proceedings of the 31st …, 2023 - dl.acm.org
The current 3D human pose estimators face challenges in adapting to new datasets due to
the scarcity of 2D-3D pose pairs in target domain training sets. We present the Multi …

Hdformer: High-order directed transformer for 3d human pose estimation

H Chen, JY He, W Xiang, ZQ Cheng, W Liu… - arXiv preprint arXiv …, 2023 - arxiv.org
Human pose estimation is a challenging task due to its structured data sequence nature.
Existing methods primarily focus on pair-wise interaction of body joints, which is insufficient …

Improving anomaly segmentation with multi-granularity cross-domain alignment

J Zhang, X Wu, ZQ Cheng, Q He, W Li - Proceedings of the 31st ACM …, 2023 - dl.acm.org
Anomaly segmentation plays a crucial role in identifying anomalous objects within images,
which facilitates the detection of road anomalies for autonomous driving. Although existing …

Adaptive optimization strategy and evaluation of vehicle-road collaborative perception algorithm in real-time settings

J Liu, B Gao, W Zhong, Y Lu, S Han - Computers and Electrical …, 2024 - Elsevier
Abstract The Intelligent and Connected Vehicle Cloud Control System is a critical approach
for achieving high-level autonomous driving. One of the key challenges at the perception …

[HTML][HTML] Explainable Safety Argumentation for the Deployment of Automated Vehicles

P Weissensteiner, G Stettinger - Electronics, 2024 - mdpi.com
With over 1.6 million traffic deaths in 2016, automated vehicles equipped with automated
driving systems (ADSs) have the potential to increase traffic safety by assuming human …

Fast Fourier inception networks for occluded video prediction

P Li, C Zhang, X Xu - IEEE Transactions on Multimedia, 2023 - ieeexplore.ieee.org
Video prediction is a pixel-level task that generates future frames by employing the historical
frames. There often exist continuous complex motions, such as object overlapping and …

[HTML][HTML] VN-MADDPG: A variable-noise-based multi-agent reinforcement learning algorithm for autonomous vehicles at unsignalized intersections

H Zhang, Y Du, S Zhao, Y Yuan, Q Gao - Electronics, 2024 - mdpi.com
The decision-making performance of autonomous vehicles tends to be unstable at
unsignalized intersections, making it difficult for them to make optimal decisions. We …

HTACPE: A Hybrid Transformer with Adaptive Content and Position Embedding for Sample Learning Efficiency of Hyperspectral Tracker

Y Wang, S Mei, M Ma, Y Liu, Y Su - IEEE Transactions on …, 2025 - ieeexplore.ieee.org
Transformer architecture has demonstrated significant potential in hyperspectral object
tracking by leveraging global correlation learning to accurately represent the data …

KeyPosS: Plug-and-Play Facial Landmark Detection through GPS-Inspired True-Range Multilateration

X Bao, ZQ Cheng, JY He, W Xiang, C Li, J Sun… - Proceedings of the 31st …, 2023 - dl.acm.org
In the realm of facial analysis, accurate landmark detection is crucial for various applications,
ranging from face recognition and expression analysis to animation. Conventional heatmap …