Deploying deep learning models on various devices has become an important topic. The wave of hardware specialization brings a diverse set of acceleration primitives for multi …
With the increasing size of DNN models and the growing discrepancy between compute performance and memory bandwidth, fusing multiple layers together to reduce off-chip …
Machine learning models with various tensor operators are becoming ubiquitous in recent years. There are two types of operators in machine learning: compute-intensive operators …
L Wang, L Ma, S Cao, Q Zhang, J Xue, Y Shi… - … USENIX Symposium on …, 2024 - usenix.org
The increasing demand for improving deep learning model performance has led to a paradigm shift in supporting low-precision computation to harness the robustness of deep …
B Hagedorn, B Fan, H Chen, C Cecka… - Proceedings of the 28th …, 2023 - dl.acm.org
Modern GPUs accelerate computations and data movements of multi-dimensional tensors in hardware. However, expressing optimized tensor computations in software is extremely …
We introduce Mosaic, a sparse tensor algebra compiler that can bind tensor expressions to external functions of other tensor algebra libraries and compilers. Users can extend Mosaic …
The deep learning (DL) compiler serves as a vital infrastructure component to enable the deployment of deep neural networks on diverse hardware platforms such as mobile devices …
S Zheng, S Chen, Y Liang - 2023 60th ACM/IEEE Design …, 2023 - ieeexplore.ieee.org
The DNN models are now pervasively used for various applications. Meanwhile, the computing hardware has shifted towards heterogeneous system composed of various …
As deep learning models nowadays are widely adopted by both cloud services and edge devices, reducing the latency of deep learning model inferences becomes crucial to provide …