Extreme heterogeneity 2018-productive computational science in the era of extreme heterogeneity: Report for DOE ASCR workshop on extreme heterogeneity

JS Vetter, R Brightwell, M Gokhale, P McCormick… - 2018 - osti.gov
The 2018 Basic Research Needs Workshop on Extreme Heterogeneity identified five Priority
Research Directions for realizing the capabilities needed to address the challenges posed …

Reproducible BLAS routines with tunable accuracy using ozaki scheme for many-core architectures

D Mukunoki, T Ogita, K Ozaki - … 2019, Bialystok, Poland, September 8–11 …, 2020 - Springer
Generally, floating-point computations comprise rounding errors; the result may be
inaccurate and not identical (non-reproducible). Particularly, heterogeneous computing has …

Design and implementation of multiple-precision BLAS Level 1 functions for graphics processing units

K Isupov, V Knyazkov, A Kuvaev - Journal of Parallel and Distributed …, 2020 - Elsevier
Abstract Basic Linear Algebra Subprograms (BLAS) are the building blocks for various
numerical algorithms and are widely used in scientific computations. However, some linear …

[HTML][HTML] Performance and energy consumption of accurate and mixed-precision linear algebra kernels on GPUs

D Mukunoki, T Ogita - Journal of Computational and Applied Mathematics, 2020 - Elsevier
This paper presents the implementation, performance, and energy consumption of accurate
and mixed-precision linear algebra kernels, including inner-product (DOT), dense matrix …

Agatha: Smart contract for DNN computation

Z Zheng, P Xie, X Zhang, S Chen, Y Chen… - arXiv preprint arXiv …, 2021 - arxiv.org
Smart contract is one of the core features of Ethereum and has inspired many blockchain
descendants. Since its advent, the verification paradigm of smart contract has been …

Toward accurate and fast summation

M Lange - ACM Transactions on Mathematical Software (TOMS), 2022 - dl.acm.org
We introduce a new accurate summation algorithm based on the error-free summation into
floating-point buckets. Our algorithm exploits ideas from Zhu and Hayes' OnlineExactSum …

Accurate matrix multiplication on binary128 format accelerated by ozaki scheme

D Mukunoki, K Ozaki, T Ogita, T Imamura - Proceedings of the 50th …, 2021 - dl.acm.org
Although IEEE 754-2008 binary128 (with a 15-bit exponent and 113-bit significand, ie,
quadruple-precision) is not currently implemented on x86 in hardware, software emulation is …

Augmented arithmetic operations proposed for IEEE-754 2018

J Riedy, J Demmel - 2018 IEEE 25th Symposium on Computer …, 2018 - ieeexplore.ieee.org
Algorithms for extending arithmetic precision through compensated summation or
arithmetics like double-double rely on operations commonly called twoSum and twoProduct …

Floating-point arithmetic

S Boldo, CP Jeannerod, G Melquiond, JM Muller - Acta Numerica, 2023 - cambridge.org
Floating-point numbers have an intuitive meaning when it comes to physics-based
numerical computations, and they have thus become the most common way of …

Enabling Bitwise Reproducibility for the Unstructured Computational Motif

B Siklósi, GR Mudalige, IZ Reguly - Applied Sciences, 2024 - mdpi.com
In this paper we identify the causes of numerical non-reproducibility in the unstructured
mesh computational motif, a class of algorithms commonly used for the solution of PDEs. We …