network compression as a joint gradient-based optimization problem, trading off between
model pruning and quantization automatically for hardware efficiency. DJPQ incorporates
variational information bottleneck based structured pruning and mixed-bit precision
quantization into a single differentiable loss function. In contrast to previous works which
consider pruning and quantization separately, our method enables users to find the optimal …
We present a differentiable joint pruning and quantization (DJPQ) scheme. We frame neural
network compression as a joint gradientbased optimization problem, trading off between
model pruning and quantization automatically for hardware efficiency. DJPQ incorporates
variational information bottleneck based structured pruning and mixedbit precision
quantization into a single differentiable loss function. In contrast to previous works which
consider pruning and quantization separately, our method enables users to find the optimal …