作者
Despoina Paschalidou
发表日期
2021
机构
ETH Zurich
简介
Humans develop a common-sense understanding of the physical behaviour of the world, within the first year of their life. We are able to identify 3D objects in a scene, infer their geometric and physical properties, predict physical events in dynamic environments and act based on our interaction with the world. Our understanding of our surroundings relies heavily on our ability to properly reason about the arrangement of elements in a scene. Inspired by early works in cognitive science that stipulate that the human visual system perceives objects as a collection of semantically coherent parts and in turn uses them to easily associate unknown objects with object parts whose functionality is already known, researchers developed compositional representations capable of capturing the functional composition and spatial arrangement of objects and object parts in a scene. In the first two parts of this dissertation, we propose learning-based solutions for recovering the 3D object geometry using semantically consistent part arrangements. Finally, we introduce a network architecture that synthesizes indoor environments as object arrangements, whose functional composition and spatial configuration follows clear patterns that are directly inferred from data. First, we present an unsupervised learning-based approach for recovering shape abstractions using superquadric surfaces as atomic elements. We demonstrate that superquadrics lead to more expressive part decompositions while being easier to learn than cuboidal primitives. Moreover, we provide an analytical solution to the Chamfer loss which avoids the need for computational expensive …