Generally, floating-point computations comprise rounding errors; the result may be inaccurate and not identical (non-reproducible). Particularly, heterogeneous computing has many factors that affect reproducibility. The loss of accuracy and reproducibility could be a crucial issue in debugging complex codes and the reliability of computations. In this paper, we propose high-performance implementations of reproducible basic linear algebra subprograms (BLAS) routines with tunable accuracy for many-core architectures. Our approach is based on an accurate matrix-multiplication method, Ozaki scheme, which can be constructed on level-3 BLAS that performs standard floating-point operations. We demonstrate the performance of three routines: inner product (DOT), matrix-vector multiplication (GEMV), and matrix-multiplication (GEMM) on NVIDIA’s Volta GPU by comparing these with the standard routines provided by the vendor. Furthermore, we demonstrate the reproducibility between CPU and GPU and its accuracy.