multiple multicore processors allow many independent threads to execute at once. At a finer-
grained level, each core contains a vector unit allowing multiple integer or floating point
calculations to be performed with a single instruction. Additionally, GPU hardware is highly
parallel and performs best when processing large numbers of independent threads. At the
same time, tools such as CUDA have become steadily more abundant and mature, allowing …