- 学术资源搜索

A comprehensive survey on pretrained foundation models: A history from bert to chatgpt

C Zhou, Q Li, C Li, J Yu, Y Liu, G Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks with different data modalities. A PFM (eg, BERT, ChatGPT, and GPT-4) is …

被引用次数：406 相关文章所有 2 个版本

[PDF] springer.com

Object detection using YOLO: Challenges, architectural successors, datasets and applications

T Diwan, G Anirudh, JV Tembhurne - multimedia Tools and Applications, 2023 - Springer

Object detection is one of the predominant and challenging problems in computer vision.
Over the decade, with the expeditious evolution of deep learning, researchers have …

被引用次数：512 相关文章所有 7 个版本

[PDF] arxiv.org

YOLOv6: A single-stage object detection framework for industrial applications

C Li, L Li, H Jiang, K Weng, Y Geng, L Li, Z Ke… - arXiv preprint arXiv …, 2022 - arxiv.org

For years, the YOLO series has been the de facto industry-level standard for efficient object
detection. The YOLO community has prospered overwhelmingly to enrich its use in a …

被引用次数：1511 相关文章所有 3 个版本

[PDF] thecvf.com

Internimage: Exploring large-scale vision foundation models with deformable convolutions

W Wang, J Dai, Z Chen, Z Huang, Z Li… - Proceedings of the …, 2023 - openaccess.thecvf.com

Compared to the great progress of large-scale vision transformers (ViTs) in recent years,
large-scale models based on convolutional neural networks (CNNs) are still in an early …

被引用次数：478 相关文章所有 8 个版本

[PDF] thecvf.com

Eva: Exploring the limits of masked visual representation learning at scale

Y Fang, W Wang, B Xie, Q Sun, L Wu… - Proceedings of the …, 2023 - openaccess.thecvf.com

We launch EVA, a vision-centric foundation model to explore the limits of visual
representation at scale using only publicly accessible data. EVA is a vanilla ViT pre-trained …

被引用次数：435 相关文章所有 5 个版本

[PDF] neurips.cc

Segnext: Rethinking convolutional attention design for semantic segmentation

MH Guo, CZ Lu, Q Hou, Z Liu… - Advances in Neural …, 2022 - proceedings.neurips.cc

We present SegNeXt, a simple convolutional network architecture for semantic
segmentation. Recent transformer-based models have dominated the field of se-mantic …

被引用次数：417 相关文章所有 6 个版本

[PDF] thecvf.com

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

CY Wang, A Bochkovskiy… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Real-time object detection is one of the most important research topics in computer vision.
As new approaches regarding architecture optimization and training optimization are …

被引用次数：6079 相关文章所有 10 个版本

[PDF] thecvf.com

Hexplane: A fast representation for dynamic scenes

A Cao, J Johnson - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com

Modeling and re-rendering dynamic 3D scenes is a challenging task in 3D vision. Prior
approaches build on NeRF and rely on implicit representations. This is slow since it requires …

被引用次数：238 相关文章所有 6 个版本

[PDF] thecvf.com

Videomae v2: Scaling video masked autoencoders with dual masking

L Wang, B Huang, Z Zhao, Z Tong… - Proceedings of the …, 2023 - openaccess.thecvf.com

Scale is the primary factor for building a powerful foundation model that could well
generalize to a variety of downstream tasks. However, it is still challenging to train video …

被引用次数：197 相关文章所有 7 个版本

Cross-city matters: A multimodal remote sensing benchmark dataset for cross-city semantic segmentation using high-resolution domain adaptation networks

D Hong, B Zhang, H Li, Y Li, J Yao, C Li… - Remote Sensing of …, 2023 - Elsevier

Artificial intelligence (AI) approaches nowadays have gained remarkable success in single-
modality-dominated remote sensing (RS) applications, especially with an emphasis on …

被引用次数：172 相关文章所有 5 个版本

高级搜索

QQ 群

A comprehensive survey on pretrained foundation models: A history from bert to chatgpt

Object detection using YOLO: Challenges, architectural successors, datasets and applications

YOLOv6: A single-stage object detection framework for industrial applications

Internimage: Exploring large-scale vision foundation models with deformable convolutions

Eva: Exploring the limits of masked visual representation learning at scale

Segnext: Rethinking convolutional attention design for semantic segmentation

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

Hexplane: A fast representation for dynamic scenes

Videomae v2: Scaling video masked autoencoders with dual masking

Cross-city matters: A multimodal remote sensing benchmark dataset for cross-city semantic segmentation using high-resolution domain adaptation networks

引用