- 学术资源搜索

A comprehensive survey on pretrained foundation models: A history from bert to chatgpt

C Zhou, Q Li, C Li, J Yu, Y Liu, G Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks with different data modalities. A PFM (eg, BERT, ChatGPT, and GPT-4) is …

被引用次数：414 相关文章所有 2 个版本

[PDF] nature.com

Self-supervised learning for medical image classification: a systematic review and implementation guidelines

SC Huang, A Pareek, M Jensen, MP Lungren… - NPJ Digital …, 2023 - nature.com

Advancements in deep learning and computer vision provide promising solutions for
medical image analysis, potentially improving healthcare and patient outcomes. However …

被引用次数：111 相关文章所有 8 个版本

[PDF] arxiv.org

Dinov2: Learning robust visual features without supervision

M Oquab, T Darcet, T Moutakanni, H Vo… - arXiv preprint arXiv …, 2023 - arxiv.org

The recent breakthroughs in natural language processing for model pretraining on large
quantities of data have opened the way for similar foundation models in computer vision …

被引用次数：1043 相关文章所有 11 个版本

[PDF] thecvf.com

Convnext v2: Co-designing and scaling convnets with masked autoencoders

S Woo, S Debnath, R Hu, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com

Driven by improved architectures and better representation learning frameworks, the field of
visual recognition has enjoyed rapid modernization and performance boost in the early …

被引用次数：360 相关文章所有 8 个版本

[PDF] thecvf.com

Videomae v2: Scaling video masked autoencoders with dual masking

L Wang, B Huang, Z Zhao, Z Tong… - Proceedings of the …, 2023 - openaccess.thecvf.com

Scale is the primary factor for building a powerful foundation model that could well
generalize to a variety of downstream tasks. However, it is still challenging to train video …

被引用次数：212 相关文章所有 7 个版本

[PDF] neurips.cc

Emergent correspondence from image diffusion

L Tang, M Jia, Q Wang, CP Phoo… - Advances in Neural …, 2023 - proceedings.neurips.cc

Finding correspondences between images is a fundamental problem in computer vision. In
this paper, we show that correspondence emerges in image diffusion models without any …

被引用次数：169 相关文章所有 12 个版本

[HTML] sciencedirect.com

[HTML][HTML] Deep learning in food category recognition

Y Zhang, L Deng, H Zhu, W Wang, Z Ren, Q Zhou… - Information …, 2023 - Elsevier

Integrating artificial intelligence with food category recognition has been a field of interest for
research for the past few decades. It is potentially one of the next steps in revolutionizing …

被引用次数：203 相关文章所有 4 个版本

[PDF] arxiv.org

SpectralGPT: Spectral remote sensing foundation model

D Hong, B Zhang, X Li, Y Li, C Li, J Yao… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

The foundation model has recently garnered significant attention due to its potential to
revolutionize the field of visual representation learning in a self-supervised manner. While …

被引用次数：195 相关文章所有 6 个版本

[PDF] thecvf.com

Self-supervised learning from images with a joint-embedding predictive architecture

M Assran, Q Duval, I Misra… - Proceedings of the …, 2023 - openaccess.thecvf.com

This paper demonstrates an approach for learning highly semantic image representations
without relying on hand-crafted data-augmentations. We introduce the Image-based Joint …

被引用次数：162 相关文章所有 7 个版本

[PDF] ieee.org

A metaverse: Taxonomy, components, applications, and open challenges

SM Park, YG Kim - IEEE access, 2022 - ieeexplore.ieee.org

Unlike previous studies on the Metaverse based on Second Life, the current Metaverse is
based on the social value of Generation Z that online and offline selves are not different …

被引用次数：1451 相关文章所有 6 个版本

高级搜索

QQ 群

A comprehensive survey on pretrained foundation models: A history from bert to chatgpt

Self-supervised learning for medical image classification: a systematic review and implementation guidelines

Dinov2: Learning robust visual features without supervision

Convnext v2: Co-designing and scaling convnets with masked autoencoders

Videomae v2: Scaling video masked autoencoders with dual masking

Emergent correspondence from image diffusion

[HTML][HTML] Deep learning in food category recognition

SpectralGPT: Spectral remote sensing foundation model

Self-supervised learning from images with a joint-embedding predictive architecture

A metaverse: Taxonomy, components, applications, and open challenges

引用