作者
Dhiraj D Kalamkar, Kunal Banerjee, Sudarshan Srinivasan, Srinivas Sridharan, Evangelos Georganas, Mikhail E Smorkalov, Cong Xu, Alexander Heinecke
发表日期
2019/9/23
研讨会论文
2019 IEEE International Conference on Cluster Computing (CLUSTER)
页码范围
1-10
出版商
IEEE
简介
Google's neural machine translation (GNMT) is state-of-the-art recurrent neural network (RNN/LSTM) based language translation application. It is computationally more demanding than well-studied convolutional neural networks (CNNs). Also, in contrast to CNNs, RNNs heavily mix compute and memory bound layers which requires careful tuning on a latency machine to optimally use fast on-die memories for best single processor performance. Additionally, due to massive compute demand, it is essential to distribute the entire workload among several processors and even compute nodes. To the best of our knowledge, this is the first work which attempts to scale this application on an Intel CPU cluster. Our CPU-based GNMT optimization, the first of its kind, achieves this by the following steps: (i) we choose a monolithic long short-term memory (LSTM) cell implementation from LIBXSMM library (specifically tuned for …
引用总数
2019202020212022202313111
学术搜索中的文章
DD Kalamkar, K Banerjee, S Srinivasan, S Sridharan… - 2019 IEEE International Conference on Cluster …, 2019