Large-scale deep neural networks (DNNs) are both compute and memory intensive. As the size of DNNs continues to grow, it is critical to improve the energy efficiency and …
In natural language processing (NLP), the" Transformer" architecture was proposed as the first transduction model replying entirely on self-attention mechanisms without using …
Recently, significant accuracy improvement has been achieved for acoustic recognition systems by increasing the model size of Long Short-Term Memory (LSTM) networks …
Deep neural networks (DNNs), as the basis of object detection, will play a key role in the development of future autonomous systems with full autonomy. The autonomous systems …
Recurrent Neural Networks (RNNs) are becoming increasingly important for time series- related applications which require efficient and real-time implementations. The two major …
We present an efficient coresets-based neural network compression algorithm that sparsifies the parameters of a trained fully-connected neural network in a manner that provably …
Weight pruning methods of deep neural networks (DNNs) have been demonstrated to achieve a good model pruning ratio without loss of accuracy, thereby alleviating the …
Weight pruning methods of deep neural networks (DNNs) have been demonstrated to achieve a good model pruning rate without loss of accuracy, thereby alleviating the …
Many model compression techniques of Deep Neural Networks (DNNs) have been investigated, including weight pruning, weight clustering and quantization, etc. Weight …