Demystifying parallel and distributed deep learning: An in-depth concurrency analysis

T Ben-Nun, T Hoefler - ACM Computing Surveys (CSUR), 2019 - dl.acm.org
Deep Neural Networks (DNNs) are becoming an important tool in modern computing
applications. Accelerating their training is a major challenge and techniques range from …

Artificial intelligence in the IoT era: A review of edge AI hardware and software

T Sipola, J Alatalo, T Kokkonen… - 2022 31st Conference …, 2022 - ieeexplore.ieee.org
The modern trend of moving artificial intelligence computation near to the origin of data
sources has increased the demand for new hardware and software suitable for such …

Benchmarking TPU, GPU, and CPU platforms for deep learning

YE Wang, GY Wei, D Brooks - arXiv preprint arXiv:1907.10701, 2019 - arxiv.org
Training deep learning models is compute-intensive and there is an industry-wide trend
towards hardware specialization to improve performance. To systematically benchmark …

Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions

N Vasilache, O Zinenko, T Theodoridis, P Goyal… - arXiv preprint arXiv …, 2018 - arxiv.org
Deep learning models with convolutional and recurrent networks are now ubiquitous and
analyze massive amounts of audio, image, video, text and graph data, with applications in …

TASO: optimizing deep learning computation with automatic generation of graph substitutions

Z Jia, O Padon, J Thomas, T Warszawski… - Proceedings of the 27th …, 2019 - dl.acm.org
Existing deep neural network (DNN) frameworks optimize the computation graph of a DNN
by applying graph transformations manually designed by human experts. This approach …

Bias in machine learning models can be significantly mitigated by careful training: Evidence from neuroimaging studies

R Wang, P Chaudhari… - Proceedings of the …, 2023 - National Acad Sciences
Despite the great promise that machine learning has offered in many fields of medicine, it
has also raised concerns about potential biases and poor generalization across genders …

Tiramisu: A polyhedral compiler for expressing fast and portable code

R Baghdadi, J Ray, MB Romdhane… - 2019 IEEE/ACM …, 2019 - ieeexplore.ieee.org
This paper introduces Tiramisu, a polyhedral framework designed to generate high
performance code for multiple platforms including multicores, GPUs, and distributed …

Learning to optimize halide with tree search and random programs

A Adams, K Ma, L Anderson, R Baghdadi… - ACM Transactions on …, 2019 - dl.acm.org
We present a new algorithm to automatically schedule Halide programs for high-
performance image processing and deep learning. We significantly improve upon the …

Glow: Graph lowering compiler techniques for neural networks

N Rotem, J Fix, S Abdulrasool, G Catron… - arXiv preprint arXiv …, 2018 - arxiv.org
This paper presents the design of Glow, a machine learning compiler for heterogeneous
hardware. It is a pragmatic approach to compilation that enables the generation of highly …

Data movement is all you need: A case study on optimizing transformers

A Ivanov, N Dryden, T Ben-Nun, S Li… - … of Machine Learning …, 2021 - proceedings.mlsys.org
Transformers are one of the most important machine learning workloads today. Training one
is a very compute-intensive task, often taking days or weeks, and significant attention has …