Multimodal biomedical AI

JN Acosta, GJ Falcone, P Rajpurkar, EJ Topol - Nature Medicine, 2022 - nature.com
The increasing availability of biomedical data from large biobanks, electronic health records,
medical imaging, wearable and ambient biosensors, and the lower cost of genome and …

Scientific discovery in the age of artificial intelligence

H Wang, T Fu, Y Du, W Gao, K Huang, Z Liu… - Nature, 2023 - nature.com
Artificial intelligence (AI) is being increasingly integrated into scientific discovery to augment
and accelerate research, helping scientists to generate hypotheses, design experiments …

Segment anything

A Kirillov, E Mintun, N Ravi, H Mao… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract We introduce the Segment Anything (SA) project: a new task, model, and dataset for
image segmentation. Using our efficient model in a data collection loop, we built the largest …

Improved baselines with visual instruction tuning

H Liu, C Li, Y Li, YJ Lee - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Large multimodal models (LMM) have recently shown encouraging progress with visual
instruction tuning. In this paper we present the first systematic study to investigate the design …

Laion-5b: An open large-scale dataset for training next generation image-text models

C Schuhmann, R Beaumont, R Vencu… - Advances in …, 2022 - proceedings.neurips.cc
Groundbreaking language-vision architectures like CLIP and DALL-E proved the utility of
training on large amounts of noisy image-text data, without relying on expensive accurate …

T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models

C Mou, X Wang, L Xie, Y Wu, J Zhang, Z Qi… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
The incredible generative ability of large-scale text-to-image (T2I) models has demonstrated
strong power of learning complex structures and meaningful semantics. However, relying …

Convnext v2: Co-designing and scaling convnets with masked autoencoders

S Woo, S Debnath, R Hu, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Driven by improved architectures and better representation learning frameworks, the field of
visual recognition has enjoyed rapid modernization and performance boost in the early …

Internimage: Exploring large-scale vision foundation models with deformable convolutions

W Wang, J Dai, Z Chen, Z Huang, Z Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
Compared to the great progress of large-scale vision transformers (ViTs) in recent years,
large-scale models based on convolutional neural networks (CNNs) are still in an early …

Eva: Exploring the limits of masked visual representation learning at scale

Y Fang, W Wang, B Xie, Q Sun, L Wu… - Proceedings of the …, 2023 - openaccess.thecvf.com
We launch EVA, a vision-centric foundation model to explore the limits of visual
representation at scale using only publicly accessible data. EVA is a vanilla ViT pre-trained …

Scalable diffusion models with transformers

W Peebles, S Xie - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
We explore a new class of diffusion models based on the transformer architecture. We train
latent diffusion models of images, replacing the commonly-used U-Net backbone with a …