the Compute Unified Device Architecture (CUDA) that provides a viable solution for
accelerating a broad class of applications. The parallel prefix sum function is an essential
building block for many data mining algorithms, and therefore its optimization facilitates the
whole data mining process. Finally, we benchmark and evaluate the performance of the
optimized parallel prefix sum building block in CUDA.