Efficient reproducible floating point summation and BLAS

Extreme heterogeneity 2018-productive computational science in the era of extreme heterogeneity: Report for DOE ASCR workshop on extreme heterogeneity

JS Vetter, R Brightwell, M Gokhale, P McCormick… - 2018 - osti.gov

The 2018 Basic Research Needs Workshop on Extreme Heterogeneity identified five Priority
Research Directions for realizing the capabilities needed to address the challenges posed …

被引用次数：87 相关文章

Reproducible BLAS routines with tunable accuracy using ozaki scheme for many-core architectures

D Mukunoki, T Ogita, K Ozaki - … 2019, Bialystok, Poland, September 8–11 …, 2020 - Springer

Generally, floating-point computations comprise rounding errors; the result may be
inaccurate and not identical (non-reproducible). Particularly, heterogeneous computing has …

被引用次数：26 相关文章所有 6 个版本

Design and implementation of multiple-precision BLAS Level 1 functions for graphics processing units

K Isupov, V Knyazkov, A Kuvaev - Journal of Parallel and Distributed …, 2020 - Elsevier

Abstract Basic Linear Algebra Subprograms (BLAS) are the building blocks for various
numerical algorithms and are widely used in scientific computations. However, some linear …

被引用次数：21 相关文章所有 2 个版本

[HTML] sciencedirect.com

[HTML][HTML] Performance and energy consumption of accurate and mixed-precision linear algebra kernels on GPUs

D Mukunoki, T Ogita - Journal of Computational and Applied Mathematics, 2020 - Elsevier

This paper presents the implementation, performance, and energy consumption of accurate
and mixed-precision linear algebra kernels, including inner-product (DOT), dense matrix …

被引用次数：17 相关文章所有 5 个版本

[PDF] arxiv.org

Agatha: Smart contract for DNN computation

Z Zheng, P Xie, X Zhang, S Chen, Y Chen… - arXiv preprint arXiv …, 2021 - arxiv.org

Smart contract is one of the core features of Ethereum and has inspired many blockchain
descendants. Since its advent, the verification paradigm of smart contract has been …

被引用次数：8 相关文章所有 2 个版本

[PDF] archive.org

Toward accurate and fast summation

M Lange - ACM Transactions on Mathematical Software (TOMS), 2022 - dl.acm.org

We introduce a new accurate summation algorithm based on the error-free summation into
floating-point buckets. Our algorithm exploits ideas from Zhu and Hayes' OnlineExactSum …

被引用次数：6 相关文章所有 2 个版本

[PDF] acm.org

Accurate matrix multiplication on binary128 format accelerated by ozaki scheme

D Mukunoki, K Ozaki, T Ogita, T Imamura - Proceedings of the 50th …, 2021 - dl.acm.org

Although IEEE 754-2008 binary128 (with a 15-bit exponent and 113-bit significand, ie,
quadruple-precision) is not currently implemented on x86 in hardware, software emulation is …

被引用次数：7 相关文章所有 6 个版本

[PDF] nsf.gov

Augmented arithmetic operations proposed for IEEE-754 2018

J Riedy, J Demmel - 2018 IEEE 25th Symposium on Computer …, 2018 - ieeexplore.ieee.org

Algorithms for extending arithmetic precision through compensated summation or
arithmetics like double-double rely on operations commonly called twoSum and twoProduct …

被引用次数：17 相关文章所有 4 个版本

[PDF] cambridge.org

Floating-point arithmetic

S Boldo, CP Jeannerod, G Melquiond, JM Muller - Acta Numerica, 2023 - cambridge.org

Floating-point numbers have an intuitive meaning when it comes to physics-based
numerical computations, and they have thus become the most common way of …

被引用次数：25 相关文章所有 11 个版本

[PDF] mdpi.com

Enabling Bitwise Reproducibility for the Unstructured Computational Motif

B Siklósi, GR Mudalige, IZ Reguly - Applied Sciences, 2024 - mdpi.com

In this paper we identify the causes of numerical non-reproducibility in the unstructured
mesh computational motif, a class of algorithms commonly used for the solution of PDEs. We …

高级搜索

QQ 群