Model compression and hardware acceleration for neural networks: A comprehensive survey

L Deng, G Li, S Han, L Shi, Y Xie - Proceedings of the IEEE, 2020 - ieeexplore.ieee.org
Domain-specific hardware is becoming a promising topic in the backdrop of improvement
slow down for general-purpose processors due to the foreseeable end of Moore's Law …

A comprehensive survey on optimizing deep learning models by metaheuristics

B Akay, D Karaboga, R Akay - Artificial Intelligence Review, 2022 - Springer
Deep neural networks (DNNs), which are extensions of artificial neural networks, can learn
higher levels of feature hierarchy established by lower level features by transforming the raw …

Zero-infinity: Breaking the gpu memory wall for extreme scale deep learning

S Rajbhandari, O Ruwase, J Rasley, S Smith… - Proceedings of the …, 2021 - dl.acm.org
In the last three years, the largest dense deep learning models have grown over 1000x to
reach hundreds of billions of parameters, while the GPU memory has only grown by 5x (16 …

Zero: Memory optimizations toward training trillion parameter models

S Rajbhandari, J Rasley, O Ruwase… - … Conference for High …, 2020 - ieeexplore.ieee.org
Large deep learning models offer significant accuracy gains, but training billions to trillions
of parameters is challenging. Existing solutions such as data and model parallelisms exhibit …

PipeDream: Generalized pipeline parallelism for DNN training

D Narayanan, A Harlap, A Phanishayee… - Proceedings of the 27th …, 2019 - dl.acm.org
DNN training is extremely time-consuming, necessitating efficient multi-accelerator
parallelization. Current approaches to parallelizing training primarily use intra-batch …

Machine learning at facebook: Understanding inference at the edge

CJ Wu, D Brooks, K Chen, D Chen… - … symposium on high …, 2019 - ieeexplore.ieee.org
At Facebook, machine learning provides a wide range of capabilities that drive many
aspects of user experience including ranking posts, content understanding, object detection …

P3: Distributed deep graph learning at scale

S Gandhi, AP Iyer - 15th {USENIX} Symposium on Operating Systems …, 2021 - usenix.org
Graph Neural Networks (GNNs) have gained significant attention in the recent past, and
become one of the fastest growing subareas in deep learning. While several new GNN …

Recnmp: Accelerating personalized recommendation with near-memory processing

L Ke, U Gupta, BY Cho, D Brooks… - 2020 ACM/IEEE 47th …, 2020 - ieeexplore.ieee.org
Personalized recommendation systems leverage deep learning models and account for the
majority of data center AI cycles. Their performance is dominated by memory-bound sparse …

Memory-efficient pipeline-parallel dnn training

D Narayanan, A Phanishayee, K Shi… - International …, 2021 - proceedings.mlr.press
Many state-of-the-art ML results have been obtained by scaling up the number of
parameters in existing models. However, parameters and activations for such large models …

Checkmate: Breaking the memory wall with optimal tensor rematerialization

P Jain, A Jain, A Nrusimha, A Gholami… - Proceedings of …, 2020 - proceedings.mlsys.org
Modern neural networks are increasingly bottlenecked by the limited capacity of on-device
GPU memory. Prior work explores dropping activations as a strategy to scale to larger neural …