holistic, multi-level pipelining scheme enabled by local error prediction. It introduces the
local error prediction unit to make the training algorithm pipelineable, while reducing
computation overhead and overall external memory access based on power-of-two
arithmetic operations and random weights. Its double-buffered PIM macro is designed for
performing both forward propagation and gradient calculation, while the dual-sparsity-aware …