Spvit: Enabling faster vision transformers via latency-aware soft token pruning

S Gao, C Zhou, J Zhang - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com

Compared with previous two-stream trackers, the recent one-stream tracking pipeline, which
allows earlier interaction between the template and search region, has achieved a …

被引用次数：67 相关文章所有 8 个版本

[PDF] thecvf.com

Propainter: Improving propagation and transformer for video inpainting

S Zhou, C Li, KCK Chan… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Flow-based propagation and spatiotemporal Transformer are two mainstream mechanisms
in video inpainting (VI). Despite the effectiveness of these components, they still suffer from …

被引用次数：30 相关文章所有 5 个版本

[PDF] thecvf.com

Token merging for fast stable diffusion

D Bolya, J Hoffman - … of the IEEE/CVF conference on …, 2023 - openaccess.thecvf.com

The landscape of image generation has been forever changed by open vocabulary diffusion
models. However, at their core these models use transformers, which makes generation …

被引用次数：41 相关文章所有 5 个版本

[PDF] arxiv.org

Hydra attention: Efficient attention with many heads

D Bolya, CY Fu, X Dai, P Zhang, J Hoffman - European Conference on …, 2022 - Springer

While transformers have begun to dominate many tasks in vision, applying them to large
images is still computationally difficult. A large reason for this is that self-attention scales …

被引用次数：54 相关文章所有 4 个版本

[PDF] thecvf.com

Joint token pruning and squeezing towards more aggressive compression of vision transformers

S Wei, T Ye, S Zhang, Y Tang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Although vision transformers (ViTs) have shown promising results in various computer vision
tasks recently, their high computational cost limits their practical applications. Previous …

被引用次数：33 相关文章所有 5 个版本

[PDF] thecvf.com

Less is more: Focus attention for efficient detr

D Zheng, W Dong, H Hu, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com

DETR-like models have significantly boosted the performance of detectors and even
outperformed classical convolutional models. However, all tokens are treated equally …

被引用次数：28 相关文章所有 5 个版本

[PDF] thecvf.com

Which tokens to use? investigating token reduction in vision transformers

JB Haurum, S Escalera, GW Taylor… - Proceedings of the …, 2023 - openaccess.thecvf.com

Since the introduction of the Vision Transformer (ViT), researchers have sought to make ViTs
more efficient by removing redundant information in the processed tokens. While different …

被引用次数：18 相关文章所有 6 个版本

[PDF] thecvf.com

Sparsevit: Revisiting activation sparsity for efficient high-resolution vision transformer

X Chen, Z Liu, H Tang, L Yi… - Proceedings of the …, 2023 - openaccess.thecvf.com

High-resolution images enable neural networks to learn richer visual representations.
However, this improved performance comes at the cost of growing computational …

被引用次数：27 相关文章所有 7 个版本

[PDF] arxiv.org

Ppt: token-pruned pose transformer for monocular and multi-view human pose estimation

H Ma, Z Wang, Y Chen, D Kong, L Chen, X Liu… - … on Computer Vision, 2022 - Springer

Recently, the vision transformer and its variants have played an increasingly important role
in both monocular and multi-view human pose estimation. Considering image patches as …

被引用次数：40 相关文章所有 6 个版本

[PDF] thecvf.com

Diffrate: Differentiable compression rate for efficient vision transformers

M Chen, W Shao, P Xu, M Lin… - Proceedings of the …, 2023 - openaccess.thecvf.com

Token compression aims to speed up large-scale vision transformers (eg ViTs) by pruning
(dropping) or merging tokens. It is an important but challenging task. Although recent …

被引用次数：19 相关文章所有 5 个版本

高级搜索

QQ 群