The deep learning compiler: A comprehensive survey

M Li, Y Liu, X Liu, Q Sun, X You, H Yang… - … on Parallel and …, 2020 - ieeexplore.ieee.org
The difficulty of deploying various deep learning (DL) models on diverse DL hardware has
boosted the research and development of DL compilers in the community. Several DL …

A survey of techniques for optimizing transformer inference

KT Chitty-Venkata, S Mittal, M Emani… - Journal of Systems …, 2023 - Elsevier
Recent years have seen a phenomenal rise in the performance and applications of
transformer neural networks. The family of transformer networks, including Bidirectional …

Ten lessons from three generations shaped google's tpuv4i: Industrial product

NP Jouppi, DH Yoon, M Ashcraft… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org
Google deployed several TPU generations since 2015, teaching us lessons that changed
our views: semi-conductor technology advances unequally; compiler compatibility trumps …

Filtering, distillation, and hard negatives for vision-language pre-training

F Radenovic, A Dubey, A Kadian… - Proceedings of the …, 2023 - openaccess.thecvf.com
Vision-language models trained with contrastive learning on large-scale noisy data are
becoming increasingly popular for zero-shot recognition problems. In this paper we improve …

A domain-specific supercomputer for training deep neural networks

NP Jouppi, DH Yoon, G Kurian, S Li, N Patil… - Communications of the …, 2020 - dl.acm.org
A domain-specific supercomputer for training deep neural networks Page 1 JULY 2020 | VOL.
63 | NO. 7 | COMMUNICATIONS OF THE ACM 67 DOI:10.1145/3360307 Google’s TPU …

[图书][B] Efficient processing of deep neural networks

V Sze, YH Chen, TJ Yang, JS Emer - 2020 - Springer
This book provides a structured treatment of the key principles and techniques for enabling
efficient processing of deep neural networks (DNNs). DNNs are currently widely used for …

[PDF][PDF] Efficient large language models: A survey

Z Wan, X Wang, C Liu, S Alam, Y Zheng… - arXiv preprint arXiv …, 2023 - researchgate.net
Abstract Large Language Models (LLMs) have demonstrated remarkable capabilities in
important tasks such as natural language understanding, language generation, and …

Pushing the limits of narrow precision inferencing at cloud scale with microsoft floating point

B Darvish Rouhani, D Lo, R Zhao… - Advances in neural …, 2020 - proceedings.neurips.cc
In this paper, we explore the limits of Microsoft Floating Point (MSFP), a new class of
datatypes developed for production cloud-scale inferencing on custom hardware. Through …

Multi-lingual evaluation of code generation models

B Athiwaratkun, SK Gouda, Z Wang, X Li, Y Tian… - arXiv preprint arXiv …, 2022 - arxiv.org
We present new benchmarks on evaluation code generation models: MBXP and Multilingual
HumanEval, and MathQA-X. These datasets cover over 10 programming languages and are …

A 1ynm 1.25 V 8Gb, 16Gb/s/pin GDDR6-based accelerator-in-memory supporting 1TFLOPS MAC operation and various activation functions for deep-learning …

S Lee, K Kim, S Oh, J Park, G Hong… - … Solid-State Circuits …, 2022 - ieeexplore.ieee.org
With advances in deep-neural-network applications the increasingly large data movement
through memory channels is becoming inevitable: specifically, RNN and MLP applications …