Fractal: Joint Multi-Level Sparse Pattern Tuning of Accuracy and Performance for DNN Pruning

Y Guan, C Yu, Y Zhou, J Leng, C Li, M Guo - Proceedings of the 29th …, 2024 - dl.acm.org
Model pruning, which eliminates redundant parameters and reduces computational
complexity, emerges as a viable strategy for efficient deep neural network (DNN) …

vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving

J Xu, R Zhang, C Guo, W Hu, Z Liu, F Wu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) are widely used across various domains, processing
millions of daily requests. This surge in demand poses significant challenges in optimizing …

Scaling Analog Photonic Accelerators for Byte-Size, Integer General Matrix Multiply (GEMM) Kernels

OA Alo, SS Vatsavai, I Thakkar - arXiv preprint arXiv:2407.06134, 2024 - arxiv.org
Deep Neural Networks (DNNs) predominantly rely on General Matrix Multiply (GEMM)
kernels, which are often accelerated using specialized hardware architectures. Recently …