MonoDETR: Depth-guided transformer for monocular 3D object detection

CY Wang, IH Yeh, HYM Liao - arXiv preprint arXiv:2402.13616, 2024 - arxiv.org

Today's deep learning methods focus on how to design the most appropriate objective
functions so that the prediction results of the model can be closest to the ground truth …

被引用次数：112 相关文章所有 3 个版本

[PDF] arxiv.org

Tip-adapter: Training-free adaption of clip for few-shot classification

R Zhang, W Zhang, R Fang, P Gao, K Li, J Dai… - European conference on …, 2022 - Springer

Abstract Contrastive Vision-Language Pre-training, known as CLIP, has provided a new
paradigm for learning visual representations using large-scale image-text pairs. It shows …

被引用次数：177 相关文章所有 6 个版本

[PDF] neurips.cc

Point-m2ae: multi-scale masked autoencoders for hierarchical point cloud pre-training

R Zhang, Z Guo, P Gao, R Fang… - Advances in neural …, 2022 - proceedings.neurips.cc

Masked Autoencoders (MAE) have shown great potentials in self-supervised pre-training for
language and 2D image transformers. However, it still remains an open question on how to …

被引用次数：173 相关文章所有 6 个版本

[PDF] mdpi.com

A survey of visual transformers

Y Liu, Y Zhang, Y Wang, F Hou, J Yuan… - … on Neural Networks …, 2023 - ieeexplore.ieee.org

Transformer, an attention-based encoder–decoder model, has already revolutionized the
field of natural language processing (NLP). Inspired by such significant achievements, some …

被引用次数：290 相关文章所有 22 个版本

[PDF] mdpi.com

Recent advances and perspectives in deep learning techniques for 3D point cloud data processing

Z Ding, Y Sun, S Xu, Y Pan, Y Peng, Z Mao - Robotics, 2023 - mdpi.com

In recent years, deep learning techniques for processing 3D point cloud data have seen
significant advancements, given their unique ability to extract relevant features and handle …

被引用次数：10 相关文章所有 3 个版本

[PDF] thecvf.com

Pimae: Point cloud and image interactive masked autoencoders for 3d object detection

A Chen, K Zhang, R Zhang, Z Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Masked Autoencoders learn strong visual representations and achieve state-of-the-art
results in several independent modalities, yet very few works have addressed their …

被引用次数：49 相关文章所有 7 个版本

[PDF] aaai.org

Calip: Zero-shot enhancement of clip with parameter-free attention

Z Guo, R Zhang, L Qiu, X Ma, X Miao, X He… - Proceedings of the AAAI …, 2023 - ojs.aaai.org

Abstract Contrastive Language-Image Pre-training (CLIP) has been shown to learn visual
representations with promising zero-shot performance. To further improve its downstream …

被引用次数：66 相关文章所有 4 个版本

[PDF] arxiv.org

Vision-centric bev perception: A survey

Y Ma, T Wang, X Bai, H Yang, Y Hou, Y Wang… - arXiv preprint arXiv …, 2022 - arxiv.org

In recent years, vision-centric Bird's Eye View (BEV) perception has garnered significant
interest from both industry and academia due to its inherent advantages, such as providing …

被引用次数：91 相关文章所有 2 个版本

[PDF] thecvf.com

Query-dependent video representation for moment retrieval and highlight detection

WJ Moon, S Hyun, SU Park, D Park… - Proceedings of the …, 2023 - openaccess.thecvf.com

Recently, video moment retrieval and highlight detection (MR/HD) are being spotlighted as
the demand for video understanding is drastically increased. The key objective of MR/HD is …

被引用次数：40 相关文章所有 5 个版本

[PDF] arxiv.org

You only need 90k parameters to adapt light: a light weight transformer for image enhancement and exposure correction

Z Cui, K Li, L Gu, S Su, P Gao, Z Jiang, Y Qiao… - arXiv preprint arXiv …, 2022 - arxiv.org

Challenging illumination conditions (low-light, under-exposure and over-exposure) in the
real world not only cast an unpleasant visual appearance but also taint the computer vision …

被引用次数：77 相关文章所有 4 个版本

高级搜索

QQ 群