Parallel memory-efficient computation of symmetric higher-order joint moment tensors

Z Li, H Kolla, ET Phipps - Proceedings of the Platform for Advanced …, 2022 - dl.acm.org
Proceedings of the Platform for Advanced Scientific Computing Conference, 2022dl.acm.org
The decomposition of higher-order joint cumulant tensors of spatio-temporal data sets is
useful in analyzing multi-variate non-Gaussian statistics with a wide variety of applications
(eg anomaly detection, independent component analysis, dimensionality reduction).
Computing the cumulant tensor often requires computing the joint moment tensor of the
input data first, which is very expensive using a naïve algorithm. The current state-of-the-art
algorithm takes advantage of the symmetric nature of a moment tensor by dividing it into …
The decomposition of higher-order joint cumulant tensors of spatio-temporal data sets is useful in analyzing multi-variate non-Gaussian statistics with a wide variety of applications (e.g. anomaly detection, independent component analysis, dimensionality reduction). Computing the cumulant tensor often requires computing the joint moment tensor of the input data first, which is very expensive using a naïve algorithm. The current state-of-the-art algorithm takes advantage of the symmetric nature of a moment tensor by dividing it into smaller cubic tensor blocks and only computing the blocks with unique values and thus reducing computation. We propose a refactoring of this algorithm by posing its computation as matrix operations, specifically Khatri-Rao products and standard matrix multiplications. An analysis of the computational and cache complexity indicates significant performance savings due to the refactoring. Implementations of our refactored algorithm in Julia show speedups up to 10x over the reference algorithm in single processor experiments. We describe multiple levels of hierarchical parallelism inherent in the refactored algorithm, and present an implementation using an advanced programming model that shows similar speedups in experiments run on a GPU.
ACM Digital Library
以上显示的是最相近的搜索结果。 查看全部搜索结果