Yolov9: Learning what you want to learn using programmable gradient information

CY Wang, IH Yeh, HYM Liao - arXiv preprint arXiv:2402.13616, 2024 - arxiv.org
Today's deep learning methods focus on how to design the most appropriate objective
functions so that the prediction results of the model can be closest to the ground truth …

Tip-adapter: Training-free adaption of clip for few-shot classification

R Zhang, W Zhang, R Fang, P Gao, K Li, J Dai… - European conference on …, 2022 - Springer
Abstract Contrastive Vision-Language Pre-training, known as CLIP, has provided a new
paradigm for learning visual representations using large-scale image-text pairs. It shows …

Point-m2ae: multi-scale masked autoencoders for hierarchical point cloud pre-training

R Zhang, Z Guo, P Gao, R Fang… - Advances in neural …, 2022 - proceedings.neurips.cc
Masked Autoencoders (MAE) have shown great potentials in self-supervised pre-training for
language and 2D image transformers. However, it still remains an open question on how to …

A survey of visual transformers

Y Liu, Y Zhang, Y Wang, F Hou, J Yuan… - … on Neural Networks …, 2023 - ieeexplore.ieee.org
Transformer, an attention-based encoder–decoder model, has already revolutionized the
field of natural language processing (NLP). Inspired by such significant achievements, some …

Recent advances and perspectives in deep learning techniques for 3D point cloud data processing

Z Ding, Y Sun, S Xu, Y Pan, Y Peng, Z Mao - Robotics, 2023 - mdpi.com
In recent years, deep learning techniques for processing 3D point cloud data have seen
significant advancements, given their unique ability to extract relevant features and handle …

Pimae: Point cloud and image interactive masked autoencoders for 3d object detection

A Chen, K Zhang, R Zhang, Z Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Masked Autoencoders learn strong visual representations and achieve state-of-the-art
results in several independent modalities, yet very few works have addressed their …

Calip: Zero-shot enhancement of clip with parameter-free attention

Z Guo, R Zhang, L Qiu, X Ma, X Miao, X He… - Proceedings of the AAAI …, 2023 - ojs.aaai.org
Abstract Contrastive Language-Image Pre-training (CLIP) has been shown to learn visual
representations with promising zero-shot performance. To further improve its downstream …

Vision-centric bev perception: A survey

Y Ma, T Wang, X Bai, H Yang, Y Hou, Y Wang… - arXiv preprint arXiv …, 2022 - arxiv.org
In recent years, vision-centric Bird's Eye View (BEV) perception has garnered significant
interest from both industry and academia due to its inherent advantages, such as providing …

Query-dependent video representation for moment retrieval and highlight detection

WJ Moon, S Hyun, SU Park, D Park… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recently, video moment retrieval and highlight detection (MR/HD) are being spotlighted as
the demand for video understanding is drastically increased. The key objective of MR/HD is …

You only need 90k parameters to adapt light: a light weight transformer for image enhancement and exposure correction

Z Cui, K Li, L Gu, S Su, P Gao, Z Jiang, Y Qiao… - arXiv preprint arXiv …, 2022 - arxiv.org
Challenging illumination conditions (low-light, under-exposure and over-exposure) in the
real world not only cast an unpleasant visual appearance but also taint the computer vision …