Mart: Masked affective representation learning via masked temporal distribution distillation

Z Zhang, P Zhao, E Park… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Limited training data is a long-standing problem for video emotion analysis (VEA). Existing
works leverage the power of large-scale image datasets for transferring while failing to …

Adapt or perish: Adaptive sparse transformer with attentive feature refinement for image restoration

S Zhou, D Chen, J Pan, J Shi… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Transformer-based approaches have achieved promising performance in image restoration
tasks given their ability to model long-range dependencies which is crucial for recovering …

Extdm: Distribution extrapolation diffusion model for video prediction

Z Zhang, J Hu, W Cheng, D Paudel… - Proceedings of the …, 2024 - openaccess.thecvf.com
Video prediction is a challenging task due to its nature of uncertainty especially for
forecasting a long period. To model the temporal dynamics advanced methods benefit from …

Alignsam: Aligning segment anything model to open context via reinforcement learning

D Huang, X Xiong, J Ma, J Li, Z Jie… - Proceedings of the …, 2024 - openaccess.thecvf.com
Powered by massive curated training data Segment Anything Model (SAM) has
demonstrated its impressive generalization capabilities in open-world scenarios with the …

Lake-red: Camouflaged images generation by latent background knowledge retrieval-augmented diffusion

P Zhao, P Xu, P Qin, DP Fan, Z Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Camouflaged vision perception is an important vision task with numerous practical
applications. Due to the expensive collection and labeling costs this community struggles …

Ordinal label distribution learning

C Wen, X Zhang, X Yao, J Yang - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Label distribution learning (LDL) is a recent hot topic, in which ambiguity is modeled via
description degrees of the labels. However, in common LDL tasks, eg, age estimation, labels …

Timeline and Boundary Guided Diffusion Network for Video Shadow Detection

H Zhou, H Wang, T Ye, Z Xing, J Ma, P Li… - arXiv preprint arXiv …, 2024 - arxiv.org
Video Shadow Detection (VSD) aims to detect the shadow masks with frame sequence.
Existing works suffer from inefficient temporal learning. Moreover, few works address the …

PlaneAC: Line-guided planar 3D reconstruction based on self-attention and convolution hybrid model

J Zhang, J Yang, F Fu, J Ma - Pattern Recognition, 2024 - Elsevier
Planar 3D reconstruction aims to simultaneously extract plane instances and reconstruct the
local 3D model through the estimated plane parameters. Existing methods achieve …

Ist-net: Prior-free category-level pose estimation with implicit space transformation

J Liu, Y Chen, X Ye, X Qi - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Category-level 6D pose estimation aims to predict the poses and sizes of unseen objects
from a specific category. Thanks to prior deformation, which explicitly adapts a category …

Annotation-efficient polyp segmentation via active learning

D Huang, X Xiong, DJ Fan, F Gao, XJ Wu… - arXiv preprint arXiv …, 2024 - arxiv.org
Deep learning-based techniques have proven effective in polyp segmentation tasks when
provided with sufficient pixel-wise labeled data. However, the high cost of manual …