This article discusses the opportunities, applications and future directions of large-scale pretrained models, ie, foundation models, which promise to significantly improve the …
A Sauer, D Lorenz, A Blattmann… - European Conference on …, 2025 - Springer
Abstract We introduce Adversarial Diffusion Distillation (ADD), a novel training approach that efficiently samples large-scale foundational image diffusion models in just 1–4 steps while …
The analysis of histopathology images with artificial intelligence aims to enable clinical decision support systems and precision medicine. The success of such applications …
In this paper, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial …
Abstract This work presents Depth Anything a highly practical solution for robust monocular depth estimation. Without pursuing novel technical modules we aim to build a simple yet …
In this work, we discuss building performant Multimodal Large Language Models (MLLMs). In particular, we study the importance of various architecture components and data choices …
Multimodal Large Language Model (MLLM) recently has been a new rising research hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform …
The autonomous driving community has witnessed a rapid growth in approaches that embrace an end-to-end algorithm framework, utilizing raw sensor input to generate vehicle …
This work presents AnyDoor a diffusion-based image generator with the power to teleport target objects to new scenes at user-specified locations with desired shapes. Instead of …