A comprehensive survey on pretrained foundation models: A history from bert to chatgpt

C Zhou, Q Li, C Li, J Yu, Y Liu, G Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks with different data modalities. A PFM (eg, BERT, ChatGPT, and GPT-4) is …

A Survey on Self-supervised Learning: Algorithms, Applications, and Future Trends

J Gui, T Chen, J Zhang, Q Cao, Z Sun… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Deep supervised learning algorithms typically require a large volume of labeled data to
achieve satisfactory performance. However, the process of collecting and labeling such data …

Simmim: A simple framework for masked image modeling

Z Xie, Z Zhang, Y Cao, Y Lin, J Bao… - Proceedings of the …, 2022 - openaccess.thecvf.com
This paper presents SimMIM, a simple framework for masked image modeling. We have
simplified recently proposed relevant approaches, without the need for special designs …

Masked siamese networks for label-efficient learning

M Assran, M Caron, I Misra, P Bojanowski… - … on Computer Vision, 2022 - Springer
Abstract We propose Masked Siamese Networks (MSN), a self-supervised learning
framework for learning image representations. Our approach matches the representation of …

Vatt: Transformers for multimodal self-supervised learning from raw video, audio and text

H Akbari, L Yuan, R Qian… - Advances in …, 2021 - proceedings.neurips.cc
We present a framework for learning multimodal representations from unlabeled data using
convolution-free Transformer architectures. Specifically, our Video-Audio-Text Transformer …

Extended vision transformer (ExViT) for land use and land cover classification: A multimodal deep learning framework

J Yao, B Zhang, C Li, D Hong… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
The recent success of attention mechanism-driven deep models, like vision transformer (ViT)
as one of the most representatives, has intrigued a wave of advanced research to explore …

Bevt: Bert pretraining of video transformers

R Wang, D Chen, Z Wu, Y Chen… - Proceedings of the …, 2022 - openaccess.thecvf.com
This paper studies the BERT pretraining of video transformers. It is a straightforward but
worth-studying extension given the recent success from BERT pretraining of image …

Deep spectral methods: A surprisingly strong baseline for unsupervised semantic segmentation and localization

L Melas-Kyriazi, C Rupprecht… - Proceedings of the …, 2022 - openaccess.thecvf.com
Unsupervised localization and segmentation are long-standing computer vision challenges
that involve decomposing an image into semantically-meaningful segments without any …

GAN-based anomaly detection: A review

X Xia, X Pan, N Li, X He, L Ma, X Zhang, N Ding - Neurocomputing, 2022 - Elsevier
Supervised learning algorithms have shown limited use in the field of anomaly detection due
to the unpredictability and difficulty in acquiring abnormal samples. In recent years …

Convolutional neural networks for multimodal remote sensing data classification

X Wu, D Hong, J Chanussot - IEEE Transactions on Geoscience …, 2021 - ieeexplore.ieee.org
In recent years, enormous research has been made to improve the classification
performance of single-modal remote sensing (RS) data. However, with the ever-growing …