Breadth-first search on heterogeneous platforms: A case of study on social networks

L Remis, MJ Garzaran, R Asenjo… - 2016 28th International …, 2016 - ieeexplore.ieee.org
2016 28th International Symposium on Computer Architecture and …, 2016ieeexplore.ieee.org
Breadth-First Search (BFS) is the core of many graph analysis algorithms and it is used in
many problems, such as social network, computer network analysis, and data organization.
BFS is an iterative algorithm that due to its irregular behavior is quite challenging to
parallelize. Several approaches implement efficient algorithms for BFS for multicore
architectures and for Graphics Processors, but it is still an open problem how to distribute the
work among the main cores and the accelerators. In this paper, we assess several …
Breadth-First Search (BFS) is the core of many graph analysis algorithms and it is used in many problems, such as social network, computer network analysis, and data organization. BFS is an iterative algorithm that due to its irregular behavior is quite challenging to parallelize. Several approaches implement efficient algorithms for BFS for multicore architectures and for Graphics Processors, but it is still an open problem how to distribute the work among the main cores and the accelerators. In this paper, we assess several approaches to perform BFS on different heterogenous architectures (highend and embedded mobile processors composed of a multi-core CPU and an integrated GPU) with a focus on social network graphs. In particular, we propose two heterogenous approaches to exploit both devices. The first one, called Selective, selects on which device to execute each iteration. It is based on a previous approach, but we have adapted it to take advantage of the features of social network graphs (fewer iterations but more unbalanced). The second approach, referred as Concurrent, allows the execution of specific iterations concurrently in both devices. Our heterogenous implementations can be up to 1.56x faster and 1.32x more energy efficient with respect to the best of only-CPU or only-GPU baselines. We have also found that for a highly memory bound problem like BFS, the CPU-GPU collaborative execution is limited by the shared-memory bus bandwidth.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果