Optimization techniques for GPU programming

P Hijma, S Heldens, A Sclocco… - ACM Computing …, 2023 - dl.acm.org
In the past decade, Graphics Processing Units have played an important role in the field of
high-performance computing and they still advance new fields such as IoT, autonomous …

Ansor: Generating {High-Performance} tensor programs for deep learning

L Zheng, C Jia, M Sun, Z Wu, CH Yu, A Haj-Ali… - … USENIX symposium on …, 2020 - usenix.org
High-performance tensor programs are crucial to guarantee efficient execution of deep
neural networks. However, obtaining performant tensor programs for different operators on …

Featgraph: A flexible and efficient backend for graph neural network systems

Y Hu, Z Ye, M Wang, J Yu, D Zheng, M Li… - … Conference for High …, 2020 - ieeexplore.ieee.org
Graph neural networks (GNNs) are gaining popularity as a promising approach to machine
learning on graphs. Unlike traditional graph workloads where each vertex/edge is …

Nimble: Efficiently compiling dynamic neural networks for model inference

H Shen, J Roesch, Z Chen, W Chen… - Proceedings of …, 2021 - proceedings.mlsys.org
Modern deep neural networks increasingly make use of features such as control flow,
dynamic data structures, and dynamic tensor shapes. Existing deep learning systems focus …

Model parallelism optimization for distributed inference via decoupled CNN structure

J Du, X Zhu, M Shen, Y Du, Y Lu… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
It is promising to deploy CNN inference on local end-user devices for high-accuracy and
time-sensitive applications. Model parallelism has the potential to provide high throughput …

Sparsetrain: Leveraging dynamic sparsity in software for training dnns on general-purpose simd processors

Z Gong, H Ji, CW Fletcher, CJ Hughes… - Proceedings of the ACM …, 2020 - dl.acm.org
Our community has improved the efficiency of deep learning applications by exploiting
sparsity in inputs. Most of that work, though, is for inference, where weight sparsity is known …

Gem5-x: A many-core heterogeneous simulation platform for architectural exploration and optimization

YM Qureshi, WA Simon, M Zapater, K Olcoz… - ACM Transactions on …, 2021 - dl.acm.org
The increasing adoption of smart systems in our daily life has led to the development of new
applications with varying performance and energy constraints, and suitable computing …

Efficient execution of quantized deep learning models: A compiler approach

A Jain, S Bhattacharya, M Masuda, V Sharma… - arXiv preprint arXiv …, 2020 - arxiv.org
A growing number of applications implement predictive functions using deep learning
models, which require heavy use of compute and memory. One popular technique for …

Goldeneye: A platform for evaluating emerging numerical data formats in dnn accelerators

A Mahmoud, T Tambe, T Aloui… - 2022 52nd Annual …, 2022 - ieeexplore.ieee.org
This paper presents GoldenEye, a functional simulator with fault injection capabilities for
common and emerging numerical formats, implemented for the PyTorch deep learning …

Analyzing deep learning model inferences for image classification using OpenVINO

Z Jin, H Finkel - 2020 IEEE International Parallel and …, 2020 - ieeexplore.ieee.org
It may be desirable to execute deep learning model inferences on an integrated GPU at the
edge. While such GPUs are much less powerful than discrete GPUs, it is able to deliver …