Momentum contrast for unsupervised visual representation learning

C Zhou, Q Li, C Li, J Yu, Y Liu, G Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks with different data modalities. A PFM (eg, BERT, ChatGPT, and GPT-4) is …

被引用次数：397 相关文章所有 2 个版本

[HTML] nih.gov

Machine learning methods for small data challenges in molecular science

B Dou, Z Zhu, E Merkurjev, L Ke, L Chen… - Chemical …, 2023 - ACS Publications

Small data are often used in scientific and engineering research due to the presence of
various constraints, such as time, cost, ethics, privacy, security, and technical limitations in …

被引用次数：82 相关文章所有 5 个版本

[PDF] arxiv.org

Dinov2: Learning robust visual features without supervision

M Oquab, T Darcet, T Moutakanni, H Vo… - arXiv preprint arXiv …, 2023 - arxiv.org

The recent breakthroughs in natural language processing for model pretraining on large
quantities of data have opened the way for similar foundation models in computer vision …

被引用次数：969 相关文章所有 11 个版本

[PDF] thecvf.com

Convnext v2: Co-designing and scaling convnets with masked autoencoders

S Woo, S Debnath, R Hu, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com

Driven by improved architectures and better representation learning frameworks, the field of
visual recognition has enjoyed rapid modernization and performance boost in the early …

被引用次数：330 相关文章所有 8 个版本

[PDF] nature.com

Accurate medium-range global weather forecasting with 3D neural networks

K Bi, L Xie, H Zhang, X Chen, X Gu, Q Tian - Nature, 2023 - nature.com

Weather forecasting is important for science and society. At present, the most accurate
forecast system is the numerical weather prediction (NWP) method, which represents …

被引用次数：453 相关文章所有 10 个版本

[PDF] arxiv.org

Consistency models

Y Song, P Dhariwal, M Chen, I Sutskever - arXiv preprint arXiv:2303.01469, 2023 - arxiv.org

Diffusion models have significantly advanced the fields of image, audio, and video
generation, but they depend on an iterative sampling process that causes slow generation …

被引用次数：428 相关文章所有 9 个版本

[PDF] thecvf.com

Scaling language-image pre-training via masking

Y Li, H Fan, R Hu… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract We present Fast Language-Image Pre-training (FLIP), a simple and more efficient
method for training CLIP. Our method randomly masks out and removes a large portion of …

被引用次数：210 相关文章所有 6 个版本

[PDF] thecvf.com

Videomae v2: Scaling video masked autoencoders with dual masking

L Wang, B Huang, Z Zhao, Z Tong… - Proceedings of the …, 2023 - openaccess.thecvf.com

Scale is the primary factor for building a powerful foundation model that could well
generalize to a variety of downstream tasks. However, it is still challenging to train video …

被引用次数：193 相关文章所有 7 个版本

[PDF] neurips.cc

Emergent correspondence from image diffusion

L Tang, M Jia, Q Wang, CP Phoo… - Advances in Neural …, 2023 - proceedings.neurips.cc

Finding correspondences between images is a fundamental problem in computer vision. In
this paper, we show that correspondence emerges in image diffusion models without any …

被引用次数：156 相关文章所有 12 个版本

[PDF] arxiv.org

A generalist agent

S Reed, K Zolna, E Parisotto, SG Colmenarejo… - arXiv preprint arXiv …, 2022 - arxiv.org

Inspired by progress in large-scale language modeling, we apply a similar approach
towards building a single generalist agent beyond the realm of text outputs. The agent …

被引用次数：763 相关文章所有 4 个版本

高级搜索

QQ 群