Gist: Efficient data encoding for deep neural network training

L Deng, G Li, S Han, L Shi, Y Xie - Proceedings of the IEEE, 2020 - ieeexplore.ieee.org

Domain-specific hardware is becoming a promising topic in the backdrop of improvement
slow down for general-purpose processors due to the foreseeable end of Moore's Law …

被引用次数：807 相关文章所有 2 个版本

A comprehensive survey on optimizing deep learning models by metaheuristics

B Akay, D Karaboga, R Akay - Artificial Intelligence Review, 2022 - Springer

Deep neural networks (DNNs), which are extensions of artificial neural networks, can learn
higher levels of feature hierarchy established by lower level features by transforming the raw …

被引用次数：79 相关文章所有 5 个版本

[PDF] arxiv.org

Zero-infinity: Breaking the gpu memory wall for extreme scale deep learning

S Rajbhandari, O Ruwase, J Rasley, S Smith… - Proceedings of the …, 2021 - dl.acm.org

In the last three years, the largest dense deep learning models have grown over 1000x to
reach hundreds of billions of parameters, while the GPU memory has only grown by 5x (16 …

被引用次数：260 相关文章所有 5 个版本

[PDF] github.io

Zero: Memory optimizations toward training trillion parameter models

S Rajbhandari, J Rasley, O Ruwase… - … Conference for High …, 2020 - ieeexplore.ieee.org

Large deep learning models offer significant accuracy gains, but training billions to trillions
of parameters is challenging. Existing solutions such as data and model parallelisms exhibit …

被引用次数：945 相关文章所有 11 个版本

[PDF] nsf.gov

PipeDream: Generalized pipeline parallelism for DNN training

D Narayanan, A Harlap, A Phanishayee… - Proceedings of the 27th …, 2019 - dl.acm.org

DNN training is extremely time-consuming, necessitating efficient multi-accelerator
parallelization. Current approaches to parallelizing training primarily use intra-batch …

被引用次数：779 相关文章所有 17 个版本

[PDF] umich.edu

Machine learning at facebook: Understanding inference at the edge

CJ Wu, D Brooks, K Chen, D Chen… - … symposium on high …, 2019 - ieeexplore.ieee.org

At Facebook, machine learning provides a wide range of capabilities that drive many
aspects of user experience including ranking posts, content understanding, object detection …

被引用次数：528 相关文章所有 6 个版本

[PDF] usenix.org

P3: Distributed deep graph learning at scale

S Gandhi, AP Iyer - 15th {USENIX} Symposium on Operating Systems …, 2021 - usenix.org

Graph Neural Networks (GNNs) have gained significant attention in the recent past, and
become one of the fastest growing subareas in deep learning. While several new GNN …

被引用次数：135 相关文章所有 9 个版本

[PDF] arxiv.org

Recnmp: Accelerating personalized recommendation with near-memory processing

L Ke, U Gupta, BY Cho, D Brooks… - 2020 ACM/IEEE 47th …, 2020 - ieeexplore.ieee.org

Personalized recommendation systems leverage deep learning models and account for the
majority of data center AI cycles. Their performance is dominated by memory-bound sparse …

被引用次数：203 相关文章所有 11 个版本

[PDF] mlr.press

Memory-efficient pipeline-parallel dnn training

D Narayanan, A Phanishayee, K Shi… - International …, 2021 - proceedings.mlr.press

Many state-of-the-art ML results have been obtained by scaling up the number of
parameters in existing models. However, parameters and activations for such large models …

被引用次数：190 相关文章所有 10 个版本

[PDF] mlsys.org

Checkmate: Breaking the memory wall with optimal tensor rematerialization

P Jain, A Jain, A Nrusimha, A Gholami… - Proceedings of …, 2020 - proceedings.mlsys.org

Modern neural networks are increasingly bottlenecked by the limited capacity of on-device
GPU memory. Prior work explores dropping activations as a strategy to scale to larger neural …

被引用次数：164 相关文章所有 7 个版本

高级搜索

QQ 群