Spvit: Enabling faster vision transformers via latency-aware soft token pruning

Z Kong, P Dong, X Ma, X Meng, W Niu, M Sun… - European conference on …, 2022 - Springer
Abstract Recently, Vision Transformer (ViT) has continuously established new milestones in
the computer vision field, while the high computation and memory cost makes its …

Image2point: 3d point-cloud understanding with 2d image pretrained models

C Xu, S Yang, T Galanti, B Wu, X Yue, B Zhai… - … on Computer Vision, 2022 - Springer
Abstract 3D point-clouds and 2D images are different visual representations of the physical
world. While human vision can understand both representations, computer vision models …

Lcpformer: Towards effective 3d point cloud analysis via local context propagation in transformers

Z Huang, Z Zhao, B Li, J Han - IEEE Transactions on Circuits …, 2023 - ieeexplore.ieee.org
Transformer with its underlying attention mechanism and the ability to capture long-range
dependencies makes it become a natural choice for unordered point cloud data. However …

Irisformer: Dense vision transformers for single-image inverse rendering in indoor scenes

R Zhu, Z Li, J Matai, F Porikli… - Proceedings of the …, 2022 - openaccess.thecvf.com
Indoor scenes exhibit significant appearance variations due to myriad interactions between
arbitrarily diverse object shapes, spatially-changing materials, and complex lighting …

Delflow: Dense efficient learning of scene flow for large-scale point clouds

C Peng, G Wang, XW Lo, X Wu, C Xu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Point clouds are naturally sparse, while image pixels are dense. The inconsistency limits
feature fusion from both modalities for point-wise scene flow estimation. Previous methods …

Detmatch: Two teachers are better than one for joint 2d and 3d semi-supervised object detection

J Park, C Xu, Y Zhou, M Tomizuka, W Zhan - European Conference on …, 2022 - Springer
While numerous 3D detection works leverage the complementary relationship between RGB
images and point clouds, developments in the broader framework of semi-supervised object …

Visual transformers: Where do transformers really belong in vision models?

B Wu, C Xu, X Dai, A Wan, P Zhang… - Proceedings of the …, 2021 - openaccess.thecvf.com
A recent trend in computer vision is to replace convolutions with transformers. However, the
performance gain of transformers is attained at a steep cost, requiring GPU years and …

Open-vocabulary 3d detection via image-level class and debiased cross-modal contrastive learning

Y Lu, C Xu, X Wei, X Xie, M Tomizuka… - arXiv preprint arXiv …, 2022 - arxiv.org
Current point-cloud detection methods have difficulty detecting the open-vocabulary objects
in the real world, due to their limited generalization capability. Moreover, it is extremely …

Collect-and-distribute transformer for 3d point cloud analysis

H Qiu, B Yu, D Tao - arXiv preprint arXiv:2306.01257, 2023 - arxiv.org
Remarkable advancements have been made recently in point cloud analysis through the
exploration of transformer architecture, but it remains challenging to effectively learn local …

A simple and efficient multi-task network for 3d object detection and road understanding

D Feng, Y Zhou, C Xu, M Tomizuka… - 2021 IEEE/RSJ …, 2021 - ieeexplore.ieee.org
Detecting dynamic objects and predicting static road information such as drivable areas and
ground heights are crucial for safe autonomous driving. Previous works studied each …