Think holistically, act down-to-earth: A semantic navigation strategy with continuous environmental representation and multi-step forward planning

B Chen, J Kang, P Zhong, Y Cui, S Lu… - … on Circuits and …, 2023 - ieeexplore.ieee.org
B Chen, J Kang, P Zhong, Y Cui, S Lu, Y Liang, J Wang
IEEE Transactions on Circuits and Systems for Video Technology, 2023ieeexplore.ieee.org
The Object goal Navigation (ObjectNav) task requires an agent to navigate through a
previously unknown domestic scenario using spatial and semantic contextual information,
where the goal is specified by a semantic label (eg, find a TV). Such a task is especially
challenging as it requires formulating and understanding the complex co-occurrence
relations among objects in diverse settings, which is critical for long-sequence navigational
decision-making. Existing methods learn to either explicitly represent co-occurrence …
The Object goal Navigation (ObjectNav) task requires an agent to navigate through a previously unknown domestic scenario using spatial and semantic contextual information, where the goal is specified by a semantic label (e.g., find a TV). Such a task is especially challenging as it requires formulating and understanding the complex co-occurrence relations among objects in diverse settings, which is critical for long-sequence navigational decision-making. Existing methods learn to either explicitly represent co-occurrence relationships as discrete semantic priors, or implicitly encode them from raw observations, thus can not benefit from the rich environmental semantics. In this work, we propose a novel Deep Reinforcement Learning (DRL) based ObjectNav strategy by actively imagining spatial and semantic clues outside the agent’s Field of View (FoV) and further mining Continuous Environmental Representations (CER) using self-supervised learning. Additionally, the illusion of spatial and semantic patterns allows the agent to perform Multi-Step Forward-Looking Planning (MSFLP) by considering the temporal evolution of egocentric local observations. Our approach is thoroughly evaluated and ablated in the visually realistic environments of the Matterport3D (MP3D) dataset. The experimental results reflect that our method combining CER and imagination-based MSFLP facilitates learning complicated semantic priors and navigation skills, thus achieving state-of-the-art performance on the ObjectNav task. In addition, adequate quantitative and qualitative analyses validate the excellent generalization ability and superiority of our method.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果