Indoor scene understanding in 2.5/3d for autonomous agents: A survey

M Naseer, S Khan, F Porikli - IEEE access, 2018 - ieeexplore.ieee.org
With the availability of low-cost and compact 2.5/3D visual sensing devices, computer vision
community is experiencing a growing interest in visual scene understanding of indoor …

Kimera: From SLAM to spatial perception with 3D dynamic scene graphs

A Rosinol, A Violette, M Abate… - … Journal of Robotics …, 2021 - journals.sagepub.com
Humans are able to form a complex mental model of the environment they move in. This
mental model captures geometric and semantic aspects of the scene, describes the …

Imvoxelnet: Image to voxels projection for monocular and multi-view general-purpose 3d object detection

D Rukhovich, A Vorontsova… - Proceedings of the …, 2022 - openaccess.thecvf.com
In this paper, we introduce the task of multi-view RGB-based 3D object detection as an end-
to-end optimization problem. To address this problem, we propose ImVoxelNet, a novel fully …

Knowledge-embedded routing network for scene graph generation

T Chen, W Yu, R Chen, L Lin - Proceedings of the IEEE/CVF …, 2019 - openaccess.thecvf.com
To understand a scene in depth not only involves locating/recognizing individual objects, but
also requires to infer the relationships and interactions among them. However, since the …

3d scene graph: A structure for unified semantics, 3d space, and camera

I Armeni, ZY He, JY Gwak, AR Zamir… - Proceedings of the …, 2019 - openaccess.thecvf.com
A comprehensive semantic understanding of a scene is important for many applications-but
in what space should diverse semantic information (eg, objects, scene categories, material …

Cubeslam: Monocular 3-d object slam

S Yang, S Scherer - IEEE Transactions on Robotics, 2019 - ieeexplore.ieee.org
In this paper, we present a method for single image three-dimensional (3-D) cuboid object
detection and multiview object simultaneous localization and mapping in both static and …

Visual relationship detection with language priors

C Lu, R Krishna, M Bernstein, L Fei-Fei - … 11–14, 2016, Proceedings, Part I …, 2016 - Springer
Visual relationships capture a wide variety of interactions between pairs of objects in images
(eg “man riding bicycle” and “man pushing bicycle”). Consequently, the set of possible …

Visual genome: Connecting language and vision using crowdsourced dense image annotations

R Krishna, Y Zhu, O Groth, J Johnson, K Hata… - International journal of …, 2017 - Springer
Despite progress in perceptual tasks such as image classification, computers still perform
poorly on cognitive tasks such as image description and question answering. Cognition is …

Scene graph generation from objects, phrases and region captions

Y Li, W Ouyang, B Zhou, K Wang… - Proceedings of the …, 2017 - openaccess.thecvf.com
Object detection, scene graph generation and region captioning, which are three scene
understanding tasks at different semantic levels, are tied together: scene graphs are …

Detecting visual relationships with deep relational networks

B Dai, Y Zhang, D Lin - … of the IEEE conference on computer …, 2017 - openaccess.thecvf.com
Relationships among objects play a crucial role in image understanding. Despite the great
success of deep learning techniques in recognizing individual objects, reasoning about the …