W Dawson, K Ozaki, J Domke… - Journal of Chemical …, 2024 - ACS Publications
The abundant demand for deep learning compute resources has created a renaissance in low-precision hardware. Going forward, it will be essential for simulation software to run on …
This paper proposes a method for implementing dense matrix multiplication on FP64 (DGEMM) and FP32 (SGEMM) using Tensor Cores on NVIDIA's graphics processing units …
Parallel implementations of Krylov subspace methods often help to accelerate the procedure of finding an approximate solution of a linear system. However, such parallelization coupled …
H Ootomo, K Ozaki, R Yokota - The International Journal of …, 2024 - journals.sagepub.com
Deep learning hardware achieves high throughput and low power consumption by reducing computing precision and specializing in matrix multiplication. For machine learning …
Although IEEE 754-2008 binary128 (with a 15-bit exponent and 113-bit significand, ie, quadruple-precision) is not currently implemented on x86 in hardware, software emulation is …
Abstract The Preconditioned Conjugate Gradient method is often used in numerical simulations. While being widely used, the solver is also known for its lack of accuracy while …
R Iakymchuk, MB Vayá, S Graillat… - … Journal of High …, 2020 - journals.sagepub.com
The Preconditioned Conjugate Gradient method is often employed for the solution of linear systems of equations arising in numerical simulations of physical phenomena. While being …
NM Evstigneev, OI Ryabkov, AN Bocharov… - … of Computational and …, 2022 - Elsevier
The aim of the paper is to improve parallel algorithms that obtain higher precision in floating point reduction-type operations while working within the basic floating point type. The …
D Mukunoki, K Ozaki, T Ogita… - … Conference on High …, 2021 - dl.acm.org
On Krylov subspace methods such as the Conjugate Gradient (CG) method, the number of iterations until convergence may increase due to the loss of computational accuracy caused …