- 学术资源搜索

Self-supervised remote sensing feature learning: Learning paradigms, challenges, and future works

C Tao, J Qi, M Guo, Q Zhu, H Li - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Deep learning has achieved great success in learning features from massive remote
sensing images (RSIs). To better understand the connection between three feature learning …

被引用次数：36 相关文章所有 3 个版本

[PDF] thecvf.com

Run, don't walk: chasing higher FLOPS for faster neural networks

J Chen, S Kao, H He, W Zhuo, S Wen… - Proceedings of the …, 2023 - openaccess.thecvf.com

To design fast neural networks, many works have been focusing on reducing the number of
floating-point operations (FLOPs). We observe that such reduction in FLOPs, however, does …

被引用次数：591 相关文章所有 10 个版本

[PDF] thecvf.com

Convnext v2: Co-designing and scaling convnets with masked autoencoders

S Woo, S Debnath, R Hu, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com

Driven by improved architectures and better representation learning frameworks, the field of
visual recognition has enjoyed rapid modernization and performance boost in the early …

被引用次数：369 相关文章所有 8 个版本

[PDF] thecvf.com

Mixformer: End-to-end tracking with iterative mixed attention

Y Cui, C Jiang, L Wang, G Wu - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

Tracking often uses a multi-stage pipeline of feature extraction, target information
integration, and bounding box estimation. To simplify this pipeline and unify the process of …

被引用次数：445 相关文章所有 11 个版本

[PDF] arxiv.org

Tip-adapter: Training-free adaption of clip for few-shot classification

R Zhang, W Zhang, R Fang, P Gao, K Li, J Dai… - European conference on …, 2022 - Springer

Abstract Contrastive Vision-Language Pre-training, known as CLIP, has provided a new
paradigm for learning visual representations using large-scale image-text pairs. It shows …

被引用次数：202 相关文章所有 6 个版本

[PDF] nowpublishers.com

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com

Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

被引用次数：117 相关文章所有 6 个版本

[PDF] thecvf.com

Learning 3d representations from 2d pre-trained models via image-to-point masked autoencoders

R Zhang, L Wang, Y Qiao, P Gao… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Pre-training by numerous image data has become de-facto for robust 2D representations. In
contrast, due to the expensive data processing, a paucity of 3D datasets severely hinders …

被引用次数：98 相关文章所有 5 个版本

[PDF] thecvf.com

Unmasked teacher: Towards training-efficient video foundation models

K Li, Y Wang, Y Li, Y Wang, Y He… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract Video Foundation Models (VFMs) have received limited exploration due to high
computational costs and data scarcity. Previous VFMs rely on Image Foundation Models …

被引用次数：79 相关文章所有 5 个版本

[PDF] thecvf.com

Scale-mae: A scale-aware masked autoencoder for multiscale geospatial representation learning

CJ Reed, R Gupta, S Li, S Brockman… - Proceedings of the …, 2023 - openaccess.thecvf.com

Large, pretrained models are commonly finetuned with imagery that is heavily augmented to
mimic different conditions and scales, with the resulting models used for various tasks with …

被引用次数：77 相关文章所有 5 个版本

[PDF] thecvf.com

Flowformer++: Masked cost volume autoencoding for pretraining optical flow estimation

X Shi, Z Huang, D Li, M Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

FlowFormer introduces a transformer architecture into optical flow estimation and achieves
state-of-the-art performance. The core component of FlowFormer is the transformer-based …

被引用次数：58 相关文章所有 5 个版本

高级搜索

QQ 群

Self-supervised remote sensing feature learning: Learning paradigms, challenges, and future works

Run, don't walk: chasing higher FLOPS for faster neural networks

Convnext v2: Co-designing and scaling convnets with masked autoencoders

Mixformer: End-to-end tracking with iterative mixed attention

Tip-adapter: Training-free adaption of clip for few-shot classification

Multimodal foundation models: From specialists to general-purpose assistants

Learning 3d representations from 2d pre-trained models via image-to-point masked autoencoders

Unmasked teacher: Towards training-efficient video foundation models

Scale-mae: A scale-aware masked autoencoder for multiscale geospatial representation learning

Flowformer++: Masked cost volume autoencoding for pretraining optical flow estimation

引用