Visual parser: Representing part-whole hierarchies with transformers

X Lai, Y Chen, F Lu, J Liu, J Jia - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

LiDAR-based 3D point cloud recognition has benefited various applications. Without
specially considering the LiDAR point distribution, most current methods suffer from …

被引用次数：147 相关文章所有 6 个版本

[PDF] thecvf.com

Stratified transformer for 3d point cloud segmentation

X Lai, J Liu, L Jiang, L Wang, H Zhao… - Proceedings of the …, 2022 - openaccess.thecvf.com

Abstract 3D point cloud segmentation has made tremendous progress in recent years. Most
current methods focus on aggregating local features, but fail to directly model long-range …

被引用次数：446 相关文章所有 9 个版本

[PDF] baai.ac.cn

A survey on vision transformer

K Han, Y Wang, H Chen, X Chen, J Guo… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …

被引用次数：2551 相关文章所有 7 个版本

[PDF] acm.org

Object-centric learning with capsule networks: A survey

F De Sousa Ribeiro, K Duarte, M Everett… - ACM Computing …, 2024 - dl.acm.org

Capsule networks emerged as a promising alternative to convolutional neural networks for
learning object-centric representations. The idea is to explicitly model part-whole hierarchies …

被引用次数：2 相关文章所有 6 个版本

[PDF] arxiv.org

A survey on visual transformer

K Han, Y Wang, H Chen, X Chen, J Guo, Z Liu… - arXiv preprint arXiv …, 2020 - arxiv.org

Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …

被引用次数：387 相关文章所有 3 个版本

[PDF] thecvf.com

Mixformer: Mixing features across windows and dimensions

Q Chen, Q Wu, J Wang, Q Hu, T Hu… - Proceedings of the …, 2022 - openaccess.thecvf.com

While local-window self-attention performs notably in vision tasks, it suffers from limited
receptive field and weak modeling capability issues. This is mainly because it performs self …

被引用次数：143 相关文章所有 6 个版本

[PDF] thecvf.com

Objectformer for image manipulation detection and localization

J Wang, Z Wu, J Chen, X Han… - Proceedings of the …, 2022 - openaccess.thecvf.com

Recent advances in image editing techniques have posed serious challenges to the
trustworthiness of multimedia data, which drives the research of image tampering detection …

被引用次数：151 相关文章所有 5 个版本

[PDF] thecvf.com

Mask-attention-free transformer for 3d instance segmentation

X Lai, Y Yuan, R Chu, Y Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com

Recently, transformer-based methods have dominated 3D instance segmentation, where
mask attention is commonly involved. Specifically, object queries are guided by the initial …

被引用次数：30 相关文章所有 5 个版本

[PDF] thecvf.com

Joint global and local hierarchical priors for learned image compression

JH Kim, B Heo, JS Lee - … of the IEEE/CVF Conference on …, 2022 - openaccess.thecvf.com

Recently, learned image compression methods have outperformed traditional hand-crafted
ones including BPG. One of the keys to this success is learned entropy models that estimate …

被引用次数：80 相关文章所有 6 个版本

[PDF] thecvf.com

Making vision transformers efficient from a token sparsification view

S Chang, P Wang, M Lin, F Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com

The quadratic computational complexity to the number of tokens limits the practical
applications of Vision Transformers (ViTs). Several works propose to prune redundant …

被引用次数：26 相关文章所有 5 个版本

高级搜索

QQ 群