Abstract Pretrained Foundation Models (PFMs) are regarded as the foundation for various downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …
Abstract We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest …
In this paper, we develop an open-set object detector, called Grounding DINO, by marrying Transformer-based detector DINO with grounded pre-training, which can detect arbitrary …
CY Wang, IH Yeh, HY Mark Liao - European Conference on Computer …, 2025 - Springer
Today's deep learning methods focus on how to design the objective functions to make the prediction as close as possible to the target. Meanwhile, an appropriate neural network …
Medical image segmentation is a critical component in clinical practice, facilitating accurate diagnosis, treatment planning, and disease monitoring. However, existing methods, often …
To design fast neural networks, many works have been focusing on reducing the number of floating-point operations (FLOPs). We observe that such reduction in FLOPs, however, does …
As the core building block of vision transformers, attention is a powerful tool to capture long- range dependency. However, such power comes at a cost: it incurs a huge computation …
Large language models (LLMs) have notably accelerated progress towards artificial general intelligence (AGI), with their impressive zero-shot capacity for user-tailored tasks, endowing …
C Chi, Z Xu, S Feng, E Cousineau… - … Journal of Robotics …, 2023 - journals.sagepub.com
This paper introduces Diffusion Policy, a new way of generating robot behavior by representing a robot's visuomotor policy as a conditional denoising diffusion process. We …