A survey of power and energy predictive models in HPC systems and applications

K O'brien, I Pietri, R Reddy, A Lastovetsky… - ACM Computing …, 2017 - dl.acm.org
Power and energy efficiency are now critical concerns in extreme-scale high-performance
scientific computing. Many extreme-scale computing systems today (for example: Top500) …

Google neural network models for edge devices: Analyzing and mitigating machine learning inference bottlenecks

A Boroumand, S Ghose, B Akin… - 2021 30th …, 2021 - ieeexplore.ieee.org
Emerging edge computing platforms often contain machine learning (ML) accelerators that
can accelerate inference for a wide range of neural network (NN) models. These models are …

System evaluation of the intel optane byte-addressable nvm

IB Peng, MB Gokhale, EW Green - Proceedings of the International …, 2019 - dl.acm.org
Byte-addressable non-volatile memory (NVM) features high density, DRAM comparable
performance, and persistence. These characteristics position NVM as a promising new tier …

A survey on software methods to improve the energy efficiency of parallel computing

C Jin, BR de Supinski, D Abramson… - … Journal of High …, 2017 - journals.sagepub.com
Energy consumption is one of the top challenges for achieving the next generation of
supercomputing. Codesign of hardware and software is critical for improving energy …

Technology prospects for data-intensive computing

K Akarvardar, HSP Wong - Proceedings of the IEEE, 2023 - ieeexplore.ieee.org
For many decades, progress in computing hardware has been closely associated with
CMOS logic density, performance, and cost. As such, slowdown in 2-D scaling, frequency …

TPU-KNN: K nearest neighbor search at peak flop/s

F Chern, B Hechtman, A Davis, R Guo… - Advances in …, 2022 - proceedings.neurips.cc
This paper presents a novel nearest neighbor search algorithm achieving TPU (Google
Tensor Processing Unit) peak performance, outperforming state-of-the-art GPU algorithms …

Applying the roofline model

G Ofenbeck, R Steinmann, V Caparros… - … Analysis of Systems …, 2014 - ieeexplore.ieee.org
The recently introduced roofline model plots the performance of executed code against its
operational intensity (operations count divided by memory traffic). It also includes two …

Roofline model toolkit: A practical tool for architectural and program analysis

YJ Lo, S Williams, B Van Straalen, TJ Ligocki… - … , and Simulation: 5th …, 2015 - Springer
We present preliminary results of the Roofline Toolkit for multicore, manycore, and
accelerated architectures. This paper focuses on the processor architecture characterization …

Reducing pagerank communication via propagation blocking

S Beamer, K Asanović… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org
Reducing communication is an important objective, as it can save energy or improve the
performance of a communication-bound application. The graph algorithm PageRank …

Measuring GPU power with the K20 built-in sensor

M Burtscher, I Zecena, Z Zong - Proceedings of Workshop on General …, 2014 - dl.acm.org
GPU-accelerated programs are becoming increasingly common in HPC, personal
computers, and even handheld devices, making it important to optimize their energy …