A study of BFLOAT16 for deep learning training

M Li, Y Liu, X Liu, Q Sun, X You, H Yang… - … on Parallel and …, 2020 - ieeexplore.ieee.org

The difficulty of deploying various deep learning (DL) models on diverse DL hardware has
boosted the research and development of DL compilers in the community. Several DL …

被引用次数：205 相关文章所有 5 个版本

[PDF] arxiv.org

A survey of techniques for optimizing transformer inference

KT Chitty-Venkata, S Mittal, M Emani… - Journal of Systems …, 2023 - Elsevier

Recent years have seen a phenomenal rise in the performance and applications of
transformer neural networks. The family of transformer networks, including Bidirectional …

被引用次数：25 相关文章所有 6 个版本

[PDF] cu.edu.eg

Ten lessons from three generations shaped google's tpuv4i: Industrial product

NP Jouppi, DH Yoon, M Ashcraft… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org

Google deployed several TPU generations since 2015, teaching us lessons that changed
our views: semi-conductor technology advances unequally; compiler compatibility trumps …

被引用次数：329 相关文章所有 6 个版本

[PDF] thecvf.com

Filtering, distillation, and hard negatives for vision-language pre-training

F Radenovic, A Dubey, A Kadian… - Proceedings of the …, 2023 - openaccess.thecvf.com

Vision-language models trained with contrastive learning on large-scale noisy data are
becoming increasingly popular for zero-shot recognition problems. In this paper we improve …

被引用次数：62 相关文章所有 9 个版本

[PDF] acm.org Full View

A domain-specific supercomputer for training deep neural networks

NP Jouppi, DH Yoon, G Kurian, S Li, N Patil… - Communications of the …, 2020 - dl.acm.org

A domain-specific supercomputer for training deep neural networks Page 1 JULY 2020 | VOL.
63 | NO. 7 | COMMUNICATIONS OF THE ACM 67 DOI:10.1145/3360307 Google’s TPU …

被引用次数：286 相关文章所有 4 个版本

[PDF] mit.edu

[图书][B] Efficient processing of deep neural networks

V Sze, YH Chen, TJ Yang, JS Emer - 2020 - Springer

This book provides a structured treatment of the key principles and techniques for enabling
efficient processing of deep neural networks (DNNs). DNNs are currently widely used for …

被引用次数：256 相关文章所有 6 个版本

[PDF] researchgate.net

[PDF][PDF] Efficient large language models: A survey

Z Wan, X Wang, C Liu, S Alam, Y Zheng… - arXiv preprint arXiv …, 2023 - researchgate.net

Abstract Large Language Models (LLMs) have demonstrated remarkable capabilities in
important tasks such as natural language understanding, language generation, and …

被引用次数：52 相关文章所有 7 个版本

[PDF] neurips.cc

Pushing the limits of narrow precision inferencing at cloud scale with microsoft floating point

B Darvish Rouhani, D Lo, R Zhao… - Advances in neural …, 2020 - proceedings.neurips.cc

In this paper, we explore the limits of Microsoft Floating Point (MSFP), a new class of
datatypes developed for production cloud-scale inferencing on custom hardware. Through …

被引用次数：110 相关文章所有 4 个版本

[PDF] arxiv.org

Multi-lingual evaluation of code generation models

B Athiwaratkun, SK Gouda, Z Wang, X Li, Y Tian… - arXiv preprint arXiv …, 2022 - arxiv.org

We present new benchmarks on evaluation code generation models: MBXP and Multilingual
HumanEval, and MathQA-X. These datasets cover over 10 programming languages and are …

被引用次数：72 相关文章所有 6 个版本

A 1ynm 1.25 V 8Gb, 16Gb/s/pin GDDR6-based accelerator-in-memory supporting 1TFLOPS MAC operation and various activation functions for deep-learning …

S Lee, K Kim, S Oh, J Park, G Hong… - … Solid-State Circuits …, 2022 - ieeexplore.ieee.org

With advances in deep-neural-network applications the increasingly large data movement
through memory channels is becoming inevitable: specifically, RNN and MLP applications …

被引用次数：81 相关文章

高级搜索

QQ 群