Generalized relation modeling for transformer tracking

S Gao, C Zhou, J Zhang - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Compared with previous two-stream trackers, the recent one-stream tracking pipeline, which
allows earlier interaction between the template and search region, has achieved a …

Propainter: Improving propagation and transformer for video inpainting

S Zhou, C Li, KCK Chan… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Flow-based propagation and spatiotemporal Transformer are two mainstream mechanisms
in video inpainting (VI). Despite the effectiveness of these components, they still suffer from …

Token merging for fast stable diffusion

D Bolya, J Hoffman - … of the IEEE/CVF conference on …, 2023 - openaccess.thecvf.com
The landscape of image generation has been forever changed by open vocabulary diffusion
models. However, at their core these models use transformers, which makes generation …

Hydra attention: Efficient attention with many heads

D Bolya, CY Fu, X Dai, P Zhang, J Hoffman - European Conference on …, 2022 - Springer
While transformers have begun to dominate many tasks in vision, applying them to large
images is still computationally difficult. A large reason for this is that self-attention scales …

Joint token pruning and squeezing towards more aggressive compression of vision transformers

S Wei, T Ye, S Zhang, Y Tang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Although vision transformers (ViTs) have shown promising results in various computer vision
tasks recently, their high computational cost limits their practical applications. Previous …

Less is more: Focus attention for efficient detr

D Zheng, W Dong, H Hu, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
DETR-like models have significantly boosted the performance of detectors and even
outperformed classical convolutional models. However, all tokens are treated equally …

Which tokens to use? investigating token reduction in vision transformers

JB Haurum, S Escalera, GW Taylor… - Proceedings of the …, 2023 - openaccess.thecvf.com
Since the introduction of the Vision Transformer (ViT), researchers have sought to make ViTs
more efficient by removing redundant information in the processed tokens. While different …

Sparsevit: Revisiting activation sparsity for efficient high-resolution vision transformer

X Chen, Z Liu, H Tang, L Yi… - Proceedings of the …, 2023 - openaccess.thecvf.com
High-resolution images enable neural networks to learn richer visual representations.
However, this improved performance comes at the cost of growing computational …

Ppt: token-pruned pose transformer for monocular and multi-view human pose estimation

H Ma, Z Wang, Y Chen, D Kong, L Chen, X Liu… - … on Computer Vision, 2022 - Springer
Recently, the vision transformer and its variants have played an increasingly important role
in both monocular and multi-view human pose estimation. Considering image patches as …

Diffrate: Differentiable compression rate for efficient vision transformers

M Chen, W Shao, P Xu, M Lin… - Proceedings of the …, 2023 - openaccess.thecvf.com
Token compression aims to speed up large-scale vision transformers (eg ViTs) by pruning
(dropping) or merging tokens. It is an important but challenging task. Although recent …