A comprehensive survey on pretrained foundation models: A history from bert to chatgpt

C Zhou, Q Li, C Li, J Yu, Y Liu, G Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks with different data modalities. A PFM (eg, BERT, ChatGPT, and GPT-4) is …

Object detection using YOLO: Challenges, architectural successors, datasets and applications

T Diwan, G Anirudh, JV Tembhurne - multimedia Tools and Applications, 2023 - Springer
Object detection is one of the predominant and challenging problems in computer vision.
Over the decade, with the expeditious evolution of deep learning, researchers have …

YOLOv6: A single-stage object detection framework for industrial applications

C Li, L Li, H Jiang, K Weng, Y Geng, L Li, Z Ke… - arXiv preprint arXiv …, 2022 - arxiv.org
For years, the YOLO series has been the de facto industry-level standard for efficient object
detection. The YOLO community has prospered overwhelmingly to enrich its use in a …

Internimage: Exploring large-scale vision foundation models with deformable convolutions

W Wang, J Dai, Z Chen, Z Huang, Z Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
Compared to the great progress of large-scale vision transformers (ViTs) in recent years,
large-scale models based on convolutional neural networks (CNNs) are still in an early …

Eva: Exploring the limits of masked visual representation learning at scale

Y Fang, W Wang, B Xie, Q Sun, L Wu… - Proceedings of the …, 2023 - openaccess.thecvf.com
We launch EVA, a vision-centric foundation model to explore the limits of visual
representation at scale using only publicly accessible data. EVA is a vanilla ViT pre-trained …

Segnext: Rethinking convolutional attention design for semantic segmentation

MH Guo, CZ Lu, Q Hou, Z Liu… - Advances in Neural …, 2022 - proceedings.neurips.cc
We present SegNeXt, a simple convolutional network architecture for semantic
segmentation. Recent transformer-based models have dominated the field of se-mantic …

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

CY Wang, A Bochkovskiy… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Real-time object detection is one of the most important research topics in computer vision.
As new approaches regarding architecture optimization and training optimization are …

Hexplane: A fast representation for dynamic scenes

A Cao, J Johnson - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Modeling and re-rendering dynamic 3D scenes is a challenging task in 3D vision. Prior
approaches build on NeRF and rely on implicit representations. This is slow since it requires …

Videomae v2: Scaling video masked autoencoders with dual masking

L Wang, B Huang, Z Zhao, Z Tong… - Proceedings of the …, 2023 - openaccess.thecvf.com
Scale is the primary factor for building a powerful foundation model that could well
generalize to a variety of downstream tasks. However, it is still challenging to train video …

Cross-city matters: A multimodal remote sensing benchmark dataset for cross-city semantic segmentation using high-resolution domain adaptation networks

D Hong, B Zhang, H Li, Y Li, J Yao, C Li… - Remote Sensing of …, 2023 - Elsevier
Artificial intelligence (AI) approaches nowadays have gained remarkable success in single-
modality-dominated remote sensing (RS) applications, especially with an emphasis on …