A comprehensive survey on pretrained foundation models: A history from bert to chatgpt

C Zhou, Q Li, C Li, J Yu, Y Liu, G Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks with different data modalities. A PFM (eg, BERT, ChatGPT, and GPT-4) is …

Object detection using YOLO: Challenges, architectural successors, datasets and applications

T Diwan, G Anirudh, JV Tembhurne - multimedia Tools and Applications, 2023 - Springer
Object detection is one of the predominant and challenging problems in computer vision.
Over the decade, with the expeditious evolution of deep learning, researchers have …

Dinov2: Learning robust visual features without supervision

M Oquab, T Darcet, T Moutakanni, H Vo… - arXiv preprint arXiv …, 2023 - arxiv.org
The recent breakthroughs in natural language processing for model pretraining on large
quantities of data have opened the way for similar foundation models in computer vision …

Oneformer: One transformer to rule universal image segmentation

J Jain, J Li, MT Chiu, A Hassani… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Universal Image Segmentation is not a new concept. Past attempts to unify image
segmentation include scene parsing, panoptic segmentation, and, more recently, new …

Generalized decoding for pixel, image, and language

X Zou, ZY Dou, J Yang, Z Gan, L Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
We present X-Decoder, a generalized decoding model that can predict pixel-level
segmentation and language tokens seamlessly. X-Decoder takes as input two types of …

Diffusion art or digital forgery? investigating data replication in diffusion models

G Somepalli, V Singla, M Goldblum… - Proceedings of the …, 2023 - openaccess.thecvf.com
Cutting-edge diffusion models produce images with high quality and customizability,
enabling them to be used for commercial art and graphic design purposes. But do diffusion …

Side adapter network for open-vocabulary semantic segmentation

M Xu, Z Zhang, F Wei, H Hu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
This paper presents a new framework for open-vocabulary semantic segmentation with the
pre-trained vision-language model, named SAN. Our approach models the semantic …

Mask dino: Towards a unified transformer-based framework for object detection and segmentation

F Li, H Zhang, H Xu, S Liu, L Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
In this paper we present Mask DINO, a unified object detection and segmentation
framework. Mask DINO extends DINO (DETR with Improved Denoising Anchor Boxes) by …

Eva-clip: Improved training techniques for clip at scale

Q Sun, Y Fang, L Wu, X Wang, Y Cao - arXiv preprint arXiv:2303.15389, 2023 - arxiv.org
Contrastive language-image pre-training, CLIP for short, has gained increasing attention for
its potential in various scenarios. In this paper, we propose EVA-CLIP, a series of models …

Large selective kernel network for remote sensing object detection

Y Li, Q Hou, Z Zheng, MM Cheng… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent research on remote sensing object detection has largely focused on improving the
representation of oriented bounding boxes but has overlooked the unique prior knowledge …