Interactive supercomputing on 40,000 cores for machine learning and data analysis

A Reuther, J Kepner, C Byun, S Samsi… - 2018 IEEE High …, 2018 - ieeexplore.ieee.org
Interactive massively parallel computations are critical for machine learning and data
analysis. These computations are a staple of the MIT Lincoln Laboratory Supercomputing …

Exploring serverless computing for neural network training

L Feng, P Kudva, D Da Silva… - 2018 IEEE 11th …, 2018 - ieeexplore.ieee.org
Serverless or functions as a service runtimes have shown significant benefits to efficiency
and cost for event-driven cloud applications. Although serverless runtimes are limited to …

Applied machine learning at facebook: A datacenter infrastructure perspective

K Hazelwood, S Bird, D Brooks… - … symposium on high …, 2018 - ieeexplore.ieee.org
Machine learning sits at the core of many essential products and services at Facebook. This
paper describes the hardware and software infrastructure that supports machine learning at …

Think fast: A tensor streaming processor (TSP) for accelerating deep learning workloads

D Abts, J Ross, J Sparling… - 2020 ACM/IEEE 47th …, 2020 - ieeexplore.ieee.org
In this paper, we introduce the Tensor Streaming Processor (TSP) architecture, a functionally-
sliced microarchitecture with memory units interleaved with vector and matrix deep learning …

λ-nic: Interactive serverless compute on programmable smartnics

S Choi, M Shahbaz, B Prabhakar… - 2020 IEEE 40th …, 2020 - ieeexplore.ieee.org
There is a growing interest in serverless compute, a cloud computing model that automates
infrastructure resource-allocation and management while billing customers only for the …

FireSim: FPGA-accelerated cycle-exact scale-out system simulation in the public cloud

S Karandikar, H Mao, D Kim, D Biancolin… - 2018 ACM/IEEE 45th …, 2018 - ieeexplore.ieee.org
We present FireSim, an open-source simulation platform that enables cycle-exact
microarchitectural simulation of large scale-out clusters by combining FPGA-accelerated …

Entropy-aware I/O pipelining for large-scale deep learning on HPC systems

Y Zhu, F Chowdhury, H Fu, A Moody… - 2018 IEEE 26th …, 2018 - ieeexplore.ieee.org
Deep neural networks have recently gained tremendous interest due to their capabilities in a
wide variety of application areas such as computer vision and speech recognition. Thus it is …

Characterizing deep-learning I/O workloads in TensorFlow

SWD Chien, S Markidis, CP Sishtla… - 2018 IEEE/ACM 3rd …, 2018 - ieeexplore.ieee.org
The performance of Deep-Learning (DL) computing frameworks rely on the performance of
data ingestion and checkpointing. In fact, during the training, a considerable high number of …

Dojo: The microarchitecture of tesla's exa-scale computer

E Talpes, D Williams, DD Sarma - 2022 IEEE Hot Chips 34 …, 2022 - computer.org
The Microarchitecture of Tesla's ExaScale Computer Emil Talpes, Douglas Williams,
DebjitDas Sarma2022 IEEE Hot Chips 34 Symposium (HCS)| 978-1-6654-6028 …

Improving big data visual analytics with interactive virtual reality

A Moran, V Gadepally, M Hubbell… - 2015 IEEE high …, 2015 - ieeexplore.ieee.org
For decades, the growth and volume of digital data collection has made it challenging to
digest large volumes of information and extract underlying structure. CoinedBig Data' …