SceneGraphLoc: Cross-Modal Coarse Visual Localization on 3D Scene Graphs

Y Miao, F Engelmann, O Vysotska, F Tombari… - arXiv preprint arXiv …, 2024 - arxiv.org
arXiv preprint arXiv:2404.00469, 2024arxiv.org
We introduce a novel problem, ie, the localization of an input image within a multi-modal
reference map represented by a database of 3D scene graphs. These graphs comprise
multiple modalities, including object-level point clouds, images, attributes, and relationships
between objects, offering a lightweight and efficient alternative to conventional methods that
rely on extensive image databases. Given the available modalities, the proposed method
SceneGraphLoc learns a fixed-sized embedding for each node (ie, representing an object …
We introduce a novel problem, i.e., the localization of an input image within a multi-modal reference map represented by a database of 3D scene graphs. These graphs comprise multiple modalities, including object-level point clouds, images, attributes, and relationships between objects, offering a lightweight and efficient alternative to conventional methods that rely on extensive image databases. Given the available modalities, the proposed method SceneGraphLoc learns a fixed-sized embedding for each node (i.e., representing an object instance) in the scene graph, enabling effective matching with the objects visible in the input query image. This strategy significantly outperforms other cross-modal methods, even without incorporating images into the map embeddings. When images are leveraged, SceneGraphLoc achieves performance close to that of state-of-the-art techniques depending on large image databases, while requiring three orders-of-magnitude less storage and operating orders-of-magnitude faster. The code will be made public.
arxiv.org
以上显示的是最相近的搜索结果。 查看全部搜索结果