Comprehensive 3D scene understanding, both geometrically and semantically, is important for real-world applications such as robot perception. Most of the existing work has focused …
Generative models excel at mimicking real scenes, suggesting they might inherently encode important intrinsic scene properties. In this paper, we aim to explore the following key …
S Zheng, Z Bao, M Hebert… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Multi-task visual learning is a critical aspect of computer vision. Current research, however, predominantly concentrates on the multi-task dense prediction setting, which overlooks the …
We investigate how to generate multimodal image outputs, such as RGB, depth, and surface normals, with a single generative model. The challenge is to produce outputs that are …
Multi-task learning has become increasingly popular in the machine learning field but its practicality is hindered by the need for large labeled datasets. Most multi-task learning …
Representation disentanglement may help AI fundamentally understand the real world and thus benefit both discrimination and generation tasks. It currently has at least three …
K Gili, G Alonso, M Schuld - Quantum Machine Intelligence, 2024 - Springer
There are two major approaches to building good machine learning algorithms: feeding lots of data into large models or picking a model class with an “inductive bias” that suits the …
Beyond high-fidelity image synthesis, diffusion models have recently exhibited promising results in dense visual perception tasks. However, most existing work treats diffusion models …
Y Lu, S Cao, YX Wang - arXiv preprint arXiv:2410.14633, 2024 - arxiv.org
Vision Foundation Models (VFMs) have demonstrated outstanding performance on numerous downstream tasks. However, due to their inherent representation biases …