The rapidly-changing deep learning landscape presents a unique opportunity for building inference accelerators optimized for specific datacenter-scale workloads. We propose Full …
J Xing, L Wang, S Zhang, J Chen… - … of Machine Learning …, 2022 - proceedings.mlsys.org
Today's auto-tuners (eg, AutoTVM, Ansor) generate efficient tensor programs by navigating a large search space to identify effective implementations, but they do so with opaque …
Z Zheng, P Zhao, G Long, F Zhu, K Zhu, W Zhao… - arXiv preprint arXiv …, 2020 - arxiv.org
We show in this work that memory intensive computations can result in severe performance problems due to off-chip memory access and CPU-GPU context switch overheads in a wide …
This article proposes DisCo, an automatic deep learning compilation module for data- parallel distributed training. Unlike most deep learning compilers that focus on training or …
The strong demand for efficient and performant deployment of Deep Learning (DL) applications prompts the rapid development of a rich DL ecosystem. To keep up with this fast …
Designing efficient deep learning architectures is a challenging task that requires balancing performance and hardware efficiency. Neural Architecture Search (NAS) has emerged as a …
Deep learning has achieved remarkable success across various domains, leading to the development of increasingly complex and resource-intensive models. For that, these models …
Today, machine learning offers a variety of services in the industry; including research, translation, recommendation systems and security. Deep learning in particular has led to …
Abstract Machine learning compilers rely on making optimized decisions in order to generate efficient code for a given computation graph. Many of these decision making …