Parallel Tucker decomposition with numerically accurate SVD

Z Li, Q Fang, G Ballard - … of the 50th International Conference on Parallel …, 2021 - dl.acm.org
Z Li, Q Fang, G Ballard
Proceedings of the 50th International Conference on Parallel Processing, 2021dl.acm.org
Tucker decomposition is a low-rank tensor approximation that generalizes a truncated matrix
singular value decomposition (SVD). Existing parallel software has shown that Tucker
decomposition is particularly effective at compressing terabyte-sized multidimensional
scientific simulation datasets, computing reduced representations that satisfy a specified
approximation error. The general approach is to get a low-rank approximation of the input
data by performing a sequence of matrix SVDs of tensor unfoldings, which tend to be short …
Tucker decomposition is a low-rank tensor approximation that generalizes a truncated matrix singular value decomposition (SVD). Existing parallel software has shown that Tucker decomposition is particularly effective at compressing terabyte-sized multidimensional scientific simulation datasets, computing reduced representations that satisfy a specified approximation error. The general approach is to get a low-rank approximation of the input data by performing a sequence of matrix SVDs of tensor unfoldings, which tend to be short-fat matrices. In the existing approach, the SVD is performed by computing the eigendecomposition of the Gram matrix of the unfolding. This method sacrifices some numerical stability in exchange for lower computation costs and easier parallelization. We propose using a more numerically stable though more computationally expensive way to compute the SVD by preprocessing with a QR decomposition step and computing an SVD of only the small triangular factor. The more numerically stable approach allows us to achieve the same accuracy with half the working precision (for example, single rather than double precision). We demonstrate that our method scales as well as the existing approach, and the use of lower precision leads to an overall reduction in running time of up to a factor of 2 when using 10s to 1000s of processors. Using the same working precision, we are also able to compute Tucker decompositions with much smaller approximation error.
ACM Digital Library
以上显示的是最相近的搜索结果。 查看全部搜索结果