Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com
Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

Spatialvlm: Endowing vision-language models with spatial reasoning capabilities

B Chen, Z Xu, S Kirmani, B Ichter… - Proceedings of the …, 2024 - openaccess.thecvf.com
Understanding and reasoning about spatial relationships is crucial for Visual Question
Answering (VQA) and robotics. Vision Language Models (VLMs) have shown impressive …

Universeg: Universal medical image segmentation

VI Butoi, JJG Ortiz, T Ma, MR Sabuncu… - Proceedings of the …, 2023 - openaccess.thecvf.com
While deep learning models have become the predominant method for medical image
segmentation, they are typically not capable of generalizing to unseen segmentation tasks …

Skeleton-in-context: Unified skeleton sequence modeling with in-context learning

X Wang, Z Fang, X Li, X Li… - Proceedings of the …, 2024 - openaccess.thecvf.com
In-context learning provides a new perspective for multi-task modeling for vision and NLP.
Under this setting the model can perceive tasks from prompts and accomplish them without …

Time does tell: Self-supervised time-tuning of dense image representations

M Salehi, E Gavves, CGM Snoek… - Proceedings of the …, 2023 - openaccess.thecvf.com
Spatially dense self-supervised learning is a rapidly growing problem domain with
promising applications for unsupervised segmentation and pretraining for dense …

Explore in-context learning for 3d point cloud understanding

Z Fang, X Li, X Li, JM Buhmann… - Advances in Neural …, 2024 - proceedings.neurips.cc
With the rise of large-scale models trained on broad data, in-context learning has become a
new learning paradigm that has demonstrated significant potential in natural language …

Context-aware meta-learning

C Fifty, D Duan, RG Junkins, E Amid… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models like ChatGPT demonstrate a remarkable capacity to learn new
concepts during inference without any fine-tuning. However, visual models trained to detect …

Open-vocabulary SAM: Segment and recognize twenty-thousand classes interactively

H Yuan, X Li, C Zhou, Y Li, K Chen, CC Loy - arXiv preprint arXiv …, 2024 - arxiv.org
The CLIP and Segment Anything Model (SAM) are remarkable vision foundation models
(VFMs). SAM excels in segmentation tasks across diverse domains, while CLIP is renowned …

Explore in-context segmentation via latent diffusion models

C Wang, X Li, H Ding, L Qi, J Zhang, Y Tong… - arXiv preprint arXiv …, 2024 - arxiv.org
In-context segmentation has drawn more attention with the introduction of vision foundation
models. Most existing approaches adopt metric learning or masked image modeling to build …

Tyche: Stochastic In-Context Learning for Medical Image Segmentation

M Rakic, HE Wong, JJG Ortiz… - Proceedings of the …, 2024 - openaccess.thecvf.com
Existing learning-based solutions to medical image segmentation have two important
shortcomings. First for most new segmentation tasks a new model has to be trained or fine …