- 学术资源搜索

Multimodal biomedical AI

JN Acosta, GJ Falcone, P Rajpurkar, EJ Topol - Nature Medicine, 2022 - nature.com

The increasing availability of biomedical data from large biobanks, electronic health records,
medical imaging, wearable and ambient biosensors, and the lower cost of genome and …

被引用次数：384 相关文章所有 6 个版本

Scientific discovery in the age of artificial intelligence

H Wang, T Fu, Y Du, W Gao, K Huang, Z Liu… - Nature, 2023 - nature.com

Artificial intelligence (AI) is being increasingly integrated into scientific discovery to augment
and accelerate research, helping scientists to generate hypotheses, design experiments …

被引用次数：390 相关文章所有 7 个版本

[PDF] thecvf.com

Segment anything

A Kirillov, E Mintun, N Ravi, H Mao… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract We introduce the Segment Anything (SA) project: a new task, model, and dataset for
image segmentation. Using our efficient model in a data collection loop, we built the largest …

被引用次数：3913 相关文章所有 7 个版本

[PDF] thecvf.com

Improved baselines with visual instruction tuning

H Liu, C Li, Y Li, YJ Lee - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

Large multimodal models (LMM) have recently shown encouraging progress with visual
instruction tuning. In this paper we present the first systematic study to investigate the design …

被引用次数：545 相关文章所有 4 个版本

[PDF] neurips.cc

Laion-5b: An open large-scale dataset for training next generation image-text models

C Schuhmann, R Beaumont, R Vencu… - Advances in …, 2022 - proceedings.neurips.cc

Groundbreaking language-vision architectures like CLIP and DALL-E proved the utility of
training on large amounts of noisy image-text data, without relying on expensive accurate …

被引用次数：1833 相关文章所有 7 个版本

[PDF] aaai.org

T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models

C Mou, X Wang, L Xie, Y Wu, J Zhang, Z Qi… - Proceedings of the AAAI …, 2024 - ojs.aaai.org

The incredible generative ability of large-scale text-to-image (T2I) models has demonstrated
strong power of learning complex structures and meaningful semantics. However, relying …

被引用次数：457 相关文章所有 2 个版本

[PDF] thecvf.com

Convnext v2: Co-designing and scaling convnets with masked autoencoders

S Woo, S Debnath, R Hu, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com

Driven by improved architectures and better representation learning frameworks, the field of
visual recognition has enjoyed rapid modernization and performance boost in the early …

被引用次数：420 相关文章所有 6 个版本

[PDF] thecvf.com

Internimage: Exploring large-scale vision foundation models with deformable convolutions

W Wang, J Dai, Z Chen, Z Huang, Z Li… - Proceedings of the …, 2023 - openaccess.thecvf.com

Compared to the great progress of large-scale vision transformers (ViTs) in recent years,
large-scale models based on convolutional neural networks (CNNs) are still in an early …

被引用次数：426 相关文章所有 8 个版本

[PDF] thecvf.com

Eva: Exploring the limits of masked visual representation learning at scale

Y Fang, W Wang, B Xie, Q Sun, L Wu… - Proceedings of the …, 2023 - openaccess.thecvf.com

We launch EVA, a vision-centric foundation model to explore the limits of visual
representation at scale using only publicly accessible data. EVA is a vanilla ViT pre-trained …

被引用次数：406 相关文章所有 6 个版本

[PDF] thecvf.com

Scalable diffusion models with transformers

W Peebles, S Xie - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

We explore a new class of diffusion models based on the transformer architecture. We train
latent diffusion models of images, replacing the commonly-used U-Net backbone with a …

被引用次数：515 相关文章所有 6 个版本

高级搜索

QQ 群

Multimodal biomedical AI

Scientific discovery in the age of artificial intelligence

Segment anything

Improved baselines with visual instruction tuning

Laion-5b: An open large-scale dataset for training next generation image-text models

T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models

Convnext v2: Co-designing and scaling convnets with masked autoencoders

Internimage: Exploring large-scale vision foundation models with deformable convolutions

Eva: Exploring the limits of masked visual representation learning at scale

Scalable diffusion models with transformers

引用