{ROLLER}: Fast and efficient tensor compilation for deep learning

H Zhu, R Wu, Y Diao, S Ke, H Li, C Zhang… - … USENIX Symposium on …, 2022 - usenix.org
Despite recent advances in tensor compilers, it often costs hours to generate an efficient
kernel for an operator, a compute-intensive sub-task in a deep neural network (DNN), on …

Edgemoe: Fast on-device inference of moe-based large language models

R Yi, L Guo, S Wei, A Zhou, S Wang, M Xu - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) such as GPTs and LLaMa have ushered in a revolution in
machine intelligence, owing to their exceptional capabilities in a wide range of machine …

MobiDepth: Real-time depth estimation using on-device dual cameras

J Zhang, H Yang, J Ren, D Zhang, B He, T Cao… - Proceedings of the 28th …, 2022 - dl.acm.org
Real-time depth estimation is critical for the increasingly popular augmented reality and
virtual reality applications on mobile devices. Yet existing solutions are insufficient as they …

Lut-nn: Empower efficient neural network inference with centroid learning and table lookup

X Tang, Y Wang, T Cao, LL Zhang, Q Chen… - Proceedings of the 29th …, 2023 - dl.acm.org
On-device Deep Neural Network (DNN) inference consumes significant computing
resources and development efforts. To alleviate that, we propose LUT-NN, the first system to …

Safe and Practical GPU Computation in TrustZone

H Park, FX Lin - Proceedings of the Eighteenth European Conference …, 2023 - dl.acm.org
For mobile devices, it is compelling to run sensitive GPU computation within a TrustZone
trusted execution environment (TEE). To minimize GPU software deployed in TEE, the …

Heterogeneous Parallel Acceleration for Edge Intelligence Systems: Challenges and Solutions

J Zhang, C Zhou, H Yang, D Zhang… - IEEE Consumer …, 2024 - ieeexplore.ieee.org
The rapid advancement of edge artificial intelligence (AI) can be attributed to the widespread
use of edge consumer devices and the enhancement in System-on-Chip (SoC) capabilities …

DynaSpa: Exploiting Spatial Sparsity for Efficient Dynamic DNN Inference on Devices

R Liu, Y Leng, S Tian, S Hu, CF Chen… - Proceedings of the 22nd …, 2024 - dl.acm.org
Recent advancements in exploring machine learning models' dynamic spatial sparsity have
demonstrated great potential for superior efficiency and adaptability without compromising …

Empowering In-Browser Deep Learning Inference on Edge Through Just-In-Time Kernel Optimization

F Jia, S Jiang, T Cao, W Cui, T Xia, X Cao, Y Li… - Proceedings of the …, 2024 - dl.acm.org
Web is increasingly becoming the primary platform to deliver AI services onto edge devices,
making in-browser deep learning (DL) inference more prominent. Nevertheless, the …

SmartMem: Layout Transformation Elimination and Adaptation for Efficient DNN Execution on Mobile

W Niu, MMR Sanim, Z Shu, J Guan, X Shen… - Proceedings of the 29th …, 2024 - dl.acm.org
This work is motivated by recent developments in Deep Neural Networks, particularly the
Transformer architectures underlying applications such as ChatGPT, and the need for …

NLTSP: A cost model for tensor program tuning using nested loop trees

X Qin, Y Li, F Lin, W Li - Journal of Systems Architecture, 2025 - Elsevier
This paper introduces NLTSP, a deep learning-based cost model designed to optimize
tensor program performance in deep learning compilers. NLTSP, short for Nested Loop Tree …