An 879GOPS 243mW 80fps VGA fully visual CNN-SLAM processor for wide-range autonomous exploration

Z Li, Y Chen, L Gong, L Liu, D Sylvester… - … Solid-State Circuits …, 2019 - ieeexplore.ieee.org
2019 IEEE International Solid-State Circuits Conference-(ISSCC), 2019ieeexplore.ieee.org
Simultaneous localization and mapping (SLAM) estimates an agent's trajectory for all six
degrees of freedom (6 DoF) and constructs a 3D map of an unknown surrounding. It is a
fundamental kernel that enables head-mounted augmented/virtual reality devices and
autonomous navigation of micro aerial vehicles. A noticeable recent trend in visual SLAM is
to apply computation-and memory-intensive convolutional neural networks (CNNs) that
outperform traditional hand-designed feature-based methods 1. For each video frame, CNN …
Simultaneous localization and mapping (SLAM) estimates an agent’s trajectory for all six degrees of freedom (6 DoF) and constructs a 3D map of an unknown surrounding. It is a fundamental kernel that enables head-mounted augmented/virtual reality devices and autonomous navigation of micro aerial vehicles. A noticeable recent trend in visual SLAM is to apply computation- and memory-intensive convolutional neural networks (CNNs) that outperform traditional hand-designed feature-based methods [1]. For each video frame, CNN-extracted features are matched with stored keypoints to estimate the agent’s 6-DoF pose by solving a perspective-n-points (PnP) non-linear optimization problem (Fig. 7.3.1, left). The agent’s long-term trajectory over multiple frames is refined by a bundle adjustment process (BA, Fig. 7.3.1 right), which involves a large-scale (120 variables) non-linear optimization. Visual SLAM requires massive computation ( GOP/s) in the CNN-based feature extraction and matching, as well as data-dependent dynamic memory access and control flow with high-precision operations, creating significant low-power design challenges. Software implementations are impractical, resulting in 0.2s runtime with a 3 GHz CPU + GPU system with MB memory footprint and W power consumption. Prior ASICs have implemented either an incomplete SLAM system [2, 3] that lacks estimation of ego-motion or employed a simplified (non-CNN) feature extraction and tracking [2, 4, 5] that limits SLAM quality and range. A recent ASIC [5] augments visual SLAM with an off-chip high-precision inertial measurement unit (IMU), simplifying the computational complexity, but incurring additional power and cost overhead.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果